diff mbox series

[RFC] spapr/xive: Allocate vCPU IPIs from local context

Message ID 20210922144120.1277504-1-clg@kaod.org (mailing list archive)
State New, archived
Headers show
Series [RFC] spapr/xive: Allocate vCPU IPIs from local context | expand

Commit Message

Cédric Le Goater Sept. 22, 2021, 2:41 p.m. UTC
When QEMU switches to the XIVE interrupt mode, it creates all possible
guest interrupts at the level of the KVM device. These interrupts are
backed by real HW interrupts from the IPI interrupt pool of the XIVE
controller.

Currently, this is done from the QEMU main thread, which results in
allocating all interrupts from the chip on which QEMU is running. IPIs
are not distributed across the system and the load is not well
balanced across the interrupt controllers.

To improve distribution on the system, we should try to allocate the
underlying HW IPI from the vCPU context. This also benefits to
configurations using CPU pinning. In this case, the HW IPI is
allocated on the local chip, rerouting between interrupt controllers
is reduced and performance improved.

This moves the initialization of the vCPU IPIs from reset time to the
H_INT_SET_SOURCE_CONFIG hcall which is called from the vCPU context.
But this needs some extra checks in the sequences getting and setting
the source states to make sure a valid HW IPI is backing the guest
interrupt. For that, we check if a target was configured in the END in
case of a vCPU IPI.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

 I have tested different SMT configurations, kernel_irqchip=off/on,
 did some migrations, CPU hotplug, etc. It's not enough and I would
 like more testing but, at least, it is not making anymore the bad
 assumption vCPU id = IPI number.

 Comments ? 

 hw/intc/spapr_xive.c     | 17 +++++++++++++++++
 hw/intc/spapr_xive_kvm.c | 36 +++++++++++++++++++++++++++++++-----
 2 files changed, 48 insertions(+), 5 deletions(-)

Comments

Greg Kurz Sept. 23, 2021, 9:12 a.m. UTC | #1
On Wed, 22 Sep 2021 16:41:20 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> When QEMU switches to the XIVE interrupt mode, it creates all possible
> guest interrupts at the level of the KVM device. These interrupts are
> backed by real HW interrupts from the IPI interrupt pool of the XIVE
> controller.
> 
> Currently, this is done from the QEMU main thread, which results in
> allocating all interrupts from the chip on which QEMU is running. IPIs
> are not distributed across the system and the load is not well
> balanced across the interrupt controllers.
> 
> To improve distribution on the system, we should try to allocate the
> underlying HW IPI from the vCPU context. This also benefits to
> configurations using CPU pinning. In this case, the HW IPI is
> allocated on the local chip, rerouting between interrupt controllers
> is reduced and performance improved.
> 
> This moves the initialization of the vCPU IPIs from reset time to the
> H_INT_SET_SOURCE_CONFIG hcall which is called from the vCPU context.
> But this needs some extra checks in the sequences getting and setting
> the source states to make sure a valid HW IPI is backing the guest
> interrupt. For that, we check if a target was configured in the END in
> case of a vCPU IPI.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
> 
>  I have tested different SMT configurations, kernel_irqchip=off/on,
>  did some migrations, CPU hotplug, etc. It's not enough and I would
>  like more testing but, at least, it is not making anymore the bad
>  assumption vCPU id = IPI number.
> 

Yeah, the IPI number is provided by the guest, so h_int_set_source_config()
is really the only place where we can know the IPI number of a given vCPU.

>  Comments ? 
> 

LGTM but I didn't check if more users of xive_end_is_valid() should
be converted to using xive_source_is_initialized().

Any chance you have some perf numbers to share ?

>  hw/intc/spapr_xive.c     | 17 +++++++++++++++++
>  hw/intc/spapr_xive_kvm.c | 36 +++++++++++++++++++++++++++++++-----
>  2 files changed, 48 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 6f31ce74f198..2dc594a720b1 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -1089,6 +1089,23 @@ static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
>      if (spapr_xive_in_kernel(xive)) {
>          Error *local_err = NULL;
>  
> +        /*
> +         * Initialize the vCPU IPIs from the vCPU context to allocate
> +         * the backing HW IPI on the local chip. This improves
> +         * distribution of the IPIs in the system and when the vCPUs
> +         * are pinned, it reduces rerouting between interrupt
> +         * controllers for better performance.
> +         */
> +        if (lisn < SPAPR_XIRQ_BASE) {
> +            XiveSource *xsrc = &xive->source;
> +
> +            kvmppc_xive_source_reset_one(xsrc, lisn, &local_err);
> +            if (local_err) {
> +                error_report_err(local_err);
> +                return H_HARDWARE;
> +            }
> +        }
> +
>          kvmppc_xive_set_source_config(xive, lisn, &new_eas, &local_err);
>          if (local_err) {
>              error_report_err(local_err);
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 53731d158625..1607a59e9483 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -254,7 +254,12 @@ static int kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
>      SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
>      int i;
>  
> -    for (i = 0; i < xsrc->nr_irqs; i++) {
> +    /*
> +     * vCPU IPIs are initialized at the KVM level when configured by
> +     * H_INT_SET_SOURCE_CONFIG.
> +     */
> +
> +    for (i = SPAPR_XIRQ_BASE; i < xsrc->nr_irqs; i++) {
>          int ret;
>  
>          if (!xive_eas_is_valid(&xive->eat[i])) {
> @@ -342,6 +347,27 @@ uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
>      }
>  }
>  
> +static bool xive_source_is_initialized(SpaprXive *xive, int lisn)
> +{
> +    assert(lisn < xive->nr_irqs);
> +
> +    if (!xive_eas_is_valid(&xive->eat[lisn])) {
> +        return false;
> +    }
> +
> +    /*
> +     * vCPU IPIs are initialized at the KVM level when configured by
> +     * H_INT_SET_SOURCE_CONFIG, in which case, we should have a valid
> +     * target (server, priority) in the END.
> +     */
> +    if (lisn < SPAPR_XIRQ_BASE) {
> +        return !!xive_get_field64(EAS_END_INDEX, xive->eat[lisn].w);
> +    }
> +
> +    /* Device sources */
> +    return true;
> +}
> +
>  static void kvmppc_xive_source_get_state(XiveSource *xsrc)
>  {
>      SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
> @@ -350,7 +376,7 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc)
>      for (i = 0; i < xsrc->nr_irqs; i++) {
>          uint8_t pq;
>  
> -        if (!xive_eas_is_valid(&xive->eat[i])) {
> +        if (!xive_source_is_initialized(xive, i)) {
>              continue;
>          }
>  
> @@ -533,7 +559,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, bool running,
>              uint8_t pq;
>              uint8_t old_pq;
>  
> -            if (!xive_eas_is_valid(&xive->eat[i])) {
> +            if (!xive_source_is_initialized(xive, i)) {
>                  continue;
>              }
>  
> @@ -561,7 +587,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, bool running,
>      for (i = 0; i < xsrc->nr_irqs; i++) {
>          uint8_t pq;
>  
> -        if (!xive_eas_is_valid(&xive->eat[i])) {
> +        if (!xive_source_is_initialized(xive, i)) {
>              continue;
>          }
>  
> @@ -666,7 +692,7 @@ int kvmppc_xive_post_load(SpaprXive *xive, int version_id)
>  
>      /* Restore the EAT */
>      for (i = 0; i < xive->nr_irqs; i++) {
> -        if (!xive_eas_is_valid(&xive->eat[i])) {
> +        if (!xive_source_is_initialized(xive, i)) {
>              continue;
>          }
>
Cédric Le Goater Sept. 24, 2021, 12:40 p.m. UTC | #2
On 9/23/21 11:12, Greg Kurz wrote:
> On Wed, 22 Sep 2021 16:41:20 +0200
> Cédric Le Goater <clg@kaod.org> wrote:
> 
>> When QEMU switches to the XIVE interrupt mode, it creates all possible
>> guest interrupts at the level of the KVM device. These interrupts are
>> backed by real HW interrupts from the IPI interrupt pool of the XIVE
>> controller.
>>
>> Currently, this is done from the QEMU main thread, which results in
>> allocating all interrupts from the chip on which QEMU is running. IPIs
>> are not distributed across the system and the load is not well
>> balanced across the interrupt controllers.
>>
>> To improve distribution on the system, we should try to allocate the
>> underlying HW IPI from the vCPU context. This also benefits to
>> configurations using CPU pinning. In this case, the HW IPI is
>> allocated on the local chip, rerouting between interrupt controllers
>> is reduced and performance improved.
>>
>> This moves the initialization of the vCPU IPIs from reset time to the
>> H_INT_SET_SOURCE_CONFIG hcall which is called from the vCPU context.
>> But this needs some extra checks in the sequences getting and setting
>> the source states to make sure a valid HW IPI is backing the guest
>> interrupt. For that, we check if a target was configured in the END in
>> case of a vCPU IPI.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>
>>   I have tested different SMT configurations, kernel_irqchip=off/on,
>>   did some migrations, CPU hotplug, etc. It's not enough and I would
>>   like more testing but, at least, it is not making anymore the bad
>>   assumption vCPU id = IPI number.
>>
> 
> Yeah, the IPI number is provided by the guest, so h_int_set_source_config()
> is really the only place where we can know the IPI number of a given vCPU.

The patch lacks a run_on_cpu() to perform the reset on the vCPU context
to be complete.

> 
>>   Comments ?
>>
> 
> LGTM but I didn't check if more users of xive_end_is_valid() should
> be converted to using xive_source_is_initialized().

I think you mean xive_eas_is_valid() ?

The changes only impact KVM support since we are deferring the IRQ
initialization at the KVM level. What we have to be careful about is not
accessing an ESB page of an interrupt that would not have been initiliazed
in the KVM device, else QEMU gets a sigbus.

That only happens when QEMU gets/sets the ESB states.
  
> Any chance you have some perf numbers to share ?

I will try.

Thanks,

C.

  
>>   hw/intc/spapr_xive.c     | 17 +++++++++++++++++
>>   hw/intc/spapr_xive_kvm.c | 36 +++++++++++++++++++++++++++++++-----
>>   2 files changed, 48 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 6f31ce74f198..2dc594a720b1 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -1089,6 +1089,23 @@ static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
>>       if (spapr_xive_in_kernel(xive)) {
>>           Error *local_err = NULL;
>>   
>> +        /*
>> +         * Initialize the vCPU IPIs from the vCPU context to allocate
>> +         * the backing HW IPI on the local chip. This improves
>> +         * distribution of the IPIs in the system and when the vCPUs
>> +         * are pinned, it reduces rerouting between interrupt
>> +         * controllers for better performance.
>> +         */
>> +        if (lisn < SPAPR_XIRQ_BASE) {
>> +            XiveSource *xsrc = &xive->source;
>> +
>> +            kvmppc_xive_source_reset_one(xsrc, lisn, &local_err);
>> +            if (local_err) {
>> +                error_report_err(local_err);
>> +                return H_HARDWARE;
>> +            }
>> +        }
>> +
>>           kvmppc_xive_set_source_config(xive, lisn, &new_eas, &local_err);
>>           if (local_err) {
>>               error_report_err(local_err);
>> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
>> index 53731d158625..1607a59e9483 100644
>> --- a/hw/intc/spapr_xive_kvm.c
>> +++ b/hw/intc/spapr_xive_kvm.c
>> @@ -254,7 +254,12 @@ static int kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
>>       SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
>>       int i;
>>   
>> -    for (i = 0; i < xsrc->nr_irqs; i++) {
>> +    /*
>> +     * vCPU IPIs are initialized at the KVM level when configured by
>> +     * H_INT_SET_SOURCE_CONFIG.
>> +     */
>> +
>> +    for (i = SPAPR_XIRQ_BASE; i < xsrc->nr_irqs; i++) {
>>           int ret;
>>   
>>           if (!xive_eas_is_valid(&xive->eat[i])) {
>> @@ -342,6 +347,27 @@ uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
>>       }
>>   }
>>   
>> +static bool xive_source_is_initialized(SpaprXive *xive, int lisn)
>> +{
>> +    assert(lisn < xive->nr_irqs);
>> +
>> +    if (!xive_eas_is_valid(&xive->eat[lisn])) {
>> +        return false;
>> +    }
>> +
>> +    /*
>> +     * vCPU IPIs are initialized at the KVM level when configured by
>> +     * H_INT_SET_SOURCE_CONFIG, in which case, we should have a valid
>> +     * target (server, priority) in the END.
>> +     */
>> +    if (lisn < SPAPR_XIRQ_BASE) {
>> +        return !!xive_get_field64(EAS_END_INDEX, xive->eat[lisn].w);
>> +    }
>> +
>> +    /* Device sources */
>> +    return true;
>> +}
>> +
>>   static void kvmppc_xive_source_get_state(XiveSource *xsrc)
>>   {
>>       SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
>> @@ -350,7 +376,7 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc)
>>       for (i = 0; i < xsrc->nr_irqs; i++) {
>>           uint8_t pq;
>>   
>> -        if (!xive_eas_is_valid(&xive->eat[i])) {
>> +        if (!xive_source_is_initialized(xive, i)) {
>>               continue;
>>           }
>>   
>> @@ -533,7 +559,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, bool running,
>>               uint8_t pq;
>>               uint8_t old_pq;
>>   
>> -            if (!xive_eas_is_valid(&xive->eat[i])) {
>> +            if (!xive_source_is_initialized(xive, i)) {
>>                   continue;
>>               }
>>   
>> @@ -561,7 +587,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, bool running,
>>       for (i = 0; i < xsrc->nr_irqs; i++) {
>>           uint8_t pq;
>>   
>> -        if (!xive_eas_is_valid(&xive->eat[i])) {
>> +        if (!xive_source_is_initialized(xive, i)) {
>>               continue;
>>           }
>>   
>> @@ -666,7 +692,7 @@ int kvmppc_xive_post_load(SpaprXive *xive, int version_id)
>>   
>>       /* Restore the EAT */
>>       for (i = 0; i < xive->nr_irqs; i++) {
>> -        if (!xive_eas_is_valid(&xive->eat[i])) {
>> +        if (!xive_source_is_initialized(xive, i)) {
>>               continue;
>>           }
>>   
>
Greg Kurz Sept. 24, 2021, 1:49 p.m. UTC | #3
On Fri, 24 Sep 2021 14:40:24 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> On 9/23/21 11:12, Greg Kurz wrote:
> > On Wed, 22 Sep 2021 16:41:20 +0200
> > Cédric Le Goater <clg@kaod.org> wrote:
> > 
> >> When QEMU switches to the XIVE interrupt mode, it creates all possible
> >> guest interrupts at the level of the KVM device. These interrupts are
> >> backed by real HW interrupts from the IPI interrupt pool of the XIVE
> >> controller.
> >>
> >> Currently, this is done from the QEMU main thread, which results in
> >> allocating all interrupts from the chip on which QEMU is running. IPIs
> >> are not distributed across the system and the load is not well
> >> balanced across the interrupt controllers.
> >>
> >> To improve distribution on the system, we should try to allocate the
> >> underlying HW IPI from the vCPU context. This also benefits to
> >> configurations using CPU pinning. In this case, the HW IPI is
> >> allocated on the local chip, rerouting between interrupt controllers
> >> is reduced and performance improved.
> >>
> >> This moves the initialization of the vCPU IPIs from reset time to the
> >> H_INT_SET_SOURCE_CONFIG hcall which is called from the vCPU context.
> >> But this needs some extra checks in the sequences getting and setting
> >> the source states to make sure a valid HW IPI is backing the guest
> >> interrupt. For that, we check if a target was configured in the END in
> >> case of a vCPU IPI.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >> ---
> >>
> >>   I have tested different SMT configurations, kernel_irqchip=off/on,
> >>   did some migrations, CPU hotplug, etc. It's not enough and I would
> >>   like more testing but, at least, it is not making anymore the bad
> >>   assumption vCPU id = IPI number.
> >>
> > 
> > Yeah, the IPI number is provided by the guest, so h_int_set_source_config()
> > is really the only place where we can know the IPI number of a given vCPU.
> 
> The patch lacks a run_on_cpu() to perform the reset on the vCPU context
> to be complete.
> 

Yes since the vCPU doing the hcall is obviously not the target for the
IPI :-)

> > 
> >>   Comments ?
> >>
> > 
> > LGTM but I didn't check if more users of xive_end_is_valid() should
> > be converted to using xive_source_is_initialized().
> 
> I think you mean xive_eas_is_valid() ?
> 

Oops yes... bad copy/paste :-\

> The changes only impact KVM support since we are deferring the IRQ
> initialization at the KVM level. What we have to be careful about is not
> accessing an ESB page of an interrupt that would not have been initiliazed
> in the KVM device, else QEMU gets a sigbus.
> 

Ok, so this is just needed on a path that leads to xive_esb_rw() or
kvmppc_xive_esb_trigger(), right ?

It seems that

h_int_esb()
 kvmppc_xive_esb_rw()

can get there with an lisn provided by the guest, and I don't see any
such check on the way : h_int_esb() only checks xive_eas_is_valid().

Cheers,

--
Greg

> That only happens when QEMU gets/sets the ESB states.
>   
> > Any chance you have some perf numbers to share ?
> 
> I will try.
> 
> Thanks,
> 
> C.
> 
>   
> >>   hw/intc/spapr_xive.c     | 17 +++++++++++++++++
> >>   hw/intc/spapr_xive_kvm.c | 36 +++++++++++++++++++++++++++++++-----
> >>   2 files changed, 48 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 6f31ce74f198..2dc594a720b1 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -1089,6 +1089,23 @@ static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
> >>       if (spapr_xive_in_kernel(xive)) {
> >>           Error *local_err = NULL;
> >>   
> >> +        /*
> >> +         * Initialize the vCPU IPIs from the vCPU context to allocate
> >> +         * the backing HW IPI on the local chip. This improves
> >> +         * distribution of the IPIs in the system and when the vCPUs
> >> +         * are pinned, it reduces rerouting between interrupt
> >> +         * controllers for better performance.
> >> +         */
> >> +        if (lisn < SPAPR_XIRQ_BASE) {
> >> +            XiveSource *xsrc = &xive->source;
> >> +
> >> +            kvmppc_xive_source_reset_one(xsrc, lisn, &local_err);
> >> +            if (local_err) {
> >> +                error_report_err(local_err);
> >> +                return H_HARDWARE;
> >> +            }
> >> +        }
> >> +
> >>           kvmppc_xive_set_source_config(xive, lisn, &new_eas, &local_err);
> >>           if (local_err) {
> >>               error_report_err(local_err);
> >> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> >> index 53731d158625..1607a59e9483 100644
> >> --- a/hw/intc/spapr_xive_kvm.c
> >> +++ b/hw/intc/spapr_xive_kvm.c
> >> @@ -254,7 +254,12 @@ static int kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
> >>       SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
> >>       int i;
> >>   
> >> -    for (i = 0; i < xsrc->nr_irqs; i++) {
> >> +    /*
> >> +     * vCPU IPIs are initialized at the KVM level when configured by
> >> +     * H_INT_SET_SOURCE_CONFIG.
> >> +     */
> >> +
> >> +    for (i = SPAPR_XIRQ_BASE; i < xsrc->nr_irqs; i++) {
> >>           int ret;
> >>   
> >>           if (!xive_eas_is_valid(&xive->eat[i])) {
> >> @@ -342,6 +347,27 @@ uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
> >>       }
> >>   }
> >>   
> >> +static bool xive_source_is_initialized(SpaprXive *xive, int lisn)
> >> +{
> >> +    assert(lisn < xive->nr_irqs);
> >> +
> >> +    if (!xive_eas_is_valid(&xive->eat[lisn])) {
> >> +        return false;
> >> +    }
> >> +
> >> +    /*
> >> +     * vCPU IPIs are initialized at the KVM level when configured by
> >> +     * H_INT_SET_SOURCE_CONFIG, in which case, we should have a valid
> >> +     * target (server, priority) in the END.
> >> +     */
> >> +    if (lisn < SPAPR_XIRQ_BASE) {
> >> +        return !!xive_get_field64(EAS_END_INDEX, xive->eat[lisn].w);
> >> +    }
> >> +
> >> +    /* Device sources */
> >> +    return true;
> >> +}
> >> +
> >>   static void kvmppc_xive_source_get_state(XiveSource *xsrc)
> >>   {
> >>       SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
> >> @@ -350,7 +376,7 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc)
> >>       for (i = 0; i < xsrc->nr_irqs; i++) {
> >>           uint8_t pq;
> >>   
> >> -        if (!xive_eas_is_valid(&xive->eat[i])) {
> >> +        if (!xive_source_is_initialized(xive, i)) {
> >>               continue;
> >>           }
> >>   
> >> @@ -533,7 +559,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, bool running,
> >>               uint8_t pq;
> >>               uint8_t old_pq;
> >>   
> >> -            if (!xive_eas_is_valid(&xive->eat[i])) {
> >> +            if (!xive_source_is_initialized(xive, i)) {
> >>                   continue;
> >>               }
> >>   
> >> @@ -561,7 +587,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, bool running,
> >>       for (i = 0; i < xsrc->nr_irqs; i++) {
> >>           uint8_t pq;
> >>   
> >> -        if (!xive_eas_is_valid(&xive->eat[i])) {
> >> +        if (!xive_source_is_initialized(xive, i)) {
> >>               continue;
> >>           }
> >>   
> >> @@ -666,7 +692,7 @@ int kvmppc_xive_post_load(SpaprXive *xive, int version_id)
> >>   
> >>       /* Restore the EAT */
> >>       for (i = 0; i < xive->nr_irqs; i++) {
> >> -        if (!xive_eas_is_valid(&xive->eat[i])) {
> >> +        if (!xive_source_is_initialized(xive, i)) {
> >>               continue;
> >>           }
> >>   
> > 
>
Cédric Le Goater Sept. 24, 2021, 2:58 p.m. UTC | #4
[ ... ]

>> The changes only impact KVM support since we are deferring the IRQ
>> initialization at the KVM level. What we have to be careful about is not
>> accessing an ESB page of an interrupt that would not have been initiliazed
>> in the KVM device, else QEMU gets a sigbus.
>>
> 
> Ok, so this is just needed on a path that leads to xive_esb_rw() or
> kvmppc_xive_esb_trigger(), right ?
> 
> It seems that
> 
> h_int_esb()
>   kvmppc_xive_esb_rw()
> 
> can get there with an lisn provided by the guest, and I don't see any
> such check on the way : h_int_esb() only checks xive_eas_is_valid().

This call is for LSI interrupts (device only) and not vCPU IPIs. see hcall
h_int_get_source_info(). I agree a hcall fuzzer could reach it.

An alternative solution would be to claim the vCPU IPIs on demand, like
we do for the MSIs, and not in spapr_irq_init(). It's a much bigger change
tough, because the H_INT hcalls consider that the interrupt numbers have
been claimed.

C.
Greg Kurz Sept. 24, 2021, 5:13 p.m. UTC | #5
On Fri, 24 Sep 2021 16:58:00 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> [ ... ]
> 
> >> The changes only impact KVM support since we are deferring the IRQ
> >> initialization at the KVM level. What we have to be careful about is not
> >> accessing an ESB page of an interrupt that would not have been initiliazed
> >> in the KVM device, else QEMU gets a sigbus.
> >>
> > 
> > Ok, so this is just needed on a path that leads to xive_esb_rw() or
> > kvmppc_xive_esb_trigger(), right ?
> > 
> > It seems that
> > 
> > h_int_esb()
> >   kvmppc_xive_esb_rw()
> > 
> > can get there with an lisn provided by the guest, and I don't see any
> > such check on the way : h_int_esb() only checks xive_eas_is_valid().
> 
> This call is for LSI interrupts (device only) and not vCPU IPIs. see hcall
> h_int_get_source_info(). I agree a hcall fuzzer could reach it.
> 

Yes we tell the guest to use H_INT_ESB for LSIs only but I don't have
the impression that it is forbidden for the guest to call H_INT_ESB
for arbitrary interrupts.

> An alternative solution would be to claim the vCPU IPIs on demand, like
> we do for the MSIs, and not in spapr_irq_init(). It's a much bigger change
> tough, because the H_INT hcalls consider that the interrupt numbers have
> been claimed.
> 

Maybe it would be simpler to call xive_source_is_initialized() instead of
xive_eas_is_valid() in cases like this, e.g. hcall code since it is shared
between emulation and KVM ?

> C.
Cédric Le Goater Sept. 27, 2021, 4:50 p.m. UTC | #6
On 9/24/21 19:13, Greg Kurz wrote:
> On Fri, 24 Sep 2021 16:58:00 +0200
> Cédric Le Goater <clg@kaod.org> wrote:
> 
>> [ ... ]
>>
>>>> The changes only impact KVM support since we are deferring the IRQ
>>>> initialization at the KVM level. What we have to be careful about is not
>>>> accessing an ESB page of an interrupt that would not have been initiliazed
>>>> in the KVM device, else QEMU gets a sigbus.
>>>>
>>>
>>> Ok, so this is just needed on a path that leads to xive_esb_rw() or
>>> kvmppc_xive_esb_trigger(), right ?
>>>
>>> It seems that
>>>
>>> h_int_esb()
>>>    kvmppc_xive_esb_rw()
>>>
>>> can get there with an lisn provided by the guest, and I don't see any
>>> such check on the way : h_int_esb() only checks xive_eas_is_valid().
>>
>> This call is for LSI interrupts (device only) and not vCPU IPIs. see hcall
>> h_int_get_source_info(). I agree a hcall fuzzer could reach it.
>>
> 
> Yes we tell the guest to use H_INT_ESB for LSIs only but I don't have
> the impression that it is forbidden for the guest to call H_INT_ESB
> for arbitrary interrupts.
> 
>> An alternative solution would be to claim the vCPU IPIs on demand, like
>> we do for the MSIs, and not in spapr_irq_init(). It's a much bigger change
>> tough, because the H_INT hcalls consider that the interrupt numbers have
>> been claimed.
>>
> 
> Maybe it would be simpler to call xive_source_is_initialized() instead of
> xive_eas_is_valid() in cases like this, e.g. hcall code since it is shared
> between emulation and KVM ?

Yes but we need a better check than :

     if (lisn < SPAPR_XIRQ_BASE) {
         return !!xive_get_field64(EAS_END_INDEX, xive->eat[lisn].w);
     }

This is more an heuristic than a precise test on the "validity" of
a source at the KVM level.


Thanks,

C.
Greg Kurz Sept. 28, 2021, 7:19 a.m. UTC | #7
On Mon, 27 Sep 2021 18:50:40 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> On 9/24/21 19:13, Greg Kurz wrote:
> > On Fri, 24 Sep 2021 16:58:00 +0200
> > Cédric Le Goater <clg@kaod.org> wrote:
> > 
> >> [ ... ]
> >>
> >>>> The changes only impact KVM support since we are deferring the IRQ
> >>>> initialization at the KVM level. What we have to be careful about is not
> >>>> accessing an ESB page of an interrupt that would not have been initiliazed
> >>>> in the KVM device, else QEMU gets a sigbus.
> >>>>
> >>>
> >>> Ok, so this is just needed on a path that leads to xive_esb_rw() or
> >>> kvmppc_xive_esb_trigger(), right ?
> >>>
> >>> It seems that
> >>>
> >>> h_int_esb()
> >>>    kvmppc_xive_esb_rw()
> >>>
> >>> can get there with an lisn provided by the guest, and I don't see any
> >>> such check on the way : h_int_esb() only checks xive_eas_is_valid().
> >>
> >> This call is for LSI interrupts (device only) and not vCPU IPIs. see hcall
> >> h_int_get_source_info(). I agree a hcall fuzzer could reach it.
> >>
> > 
> > Yes we tell the guest to use H_INT_ESB for LSIs only but I don't have
> > the impression that it is forbidden for the guest to call H_INT_ESB
> > for arbitrary interrupts.
> > 
> >> An alternative solution would be to claim the vCPU IPIs on demand, like
> >> we do for the MSIs, and not in spapr_irq_init(). It's a much bigger change
> >> tough, because the H_INT hcalls consider that the interrupt numbers have
> >> been claimed.
> >>
> > 
> > Maybe it would be simpler to call xive_source_is_initialized() instead of
> > xive_eas_is_valid() in cases like this, e.g. hcall code since it is shared
> > between emulation and KVM ?
> 
> Yes but we need a better check than :
> 
>      if (lisn < SPAPR_XIRQ_BASE) {
>          return !!xive_get_field64(EAS_END_INDEX, xive->eat[lisn].w);
>      }
> 
> This is more an heuristic than a precise test on the "validity" of
> a source at the KVM level.
> 

I guess we should be able to get kvmppc_xive_irq_state::valid from
KVM by making the KVM_DEV_XIVE_GRP_SOURCE attribute readable. Would
that be enough ?

> 
> Thanks,
> 
> C.
Cédric Le Goater Oct. 1, 2021, 9:59 a.m. UTC | #8
>>> Maybe it would be simpler to call xive_source_is_initialized() instead of
>>> xive_eas_is_valid() in cases like this, e.g. hcall code since it is shared
>>> between emulation and KVM ?
>>
>> Yes but we need a better check than :
>>
>>       if (lisn < SPAPR_XIRQ_BASE) {
>>           return !!xive_get_field64(EAS_END_INDEX, xive->eat[lisn].w);
>>       }
>>
>> This is more an heuristic than a precise test on the "validity" of
>> a source at the KVM level.
>>
> 
> I guess we should be able to get kvmppc_xive_irq_state::valid from
> KVM by making the KVM_DEV_XIVE_GRP_SOURCE attribute readable. Would
> that be enough ?

A bit slow may be. Or an extra bit in the XiveSource 'status' field
to reflect KVM initialization.

C.
Greg Kurz Oct. 1, 2021, 10:38 a.m. UTC | #9
On Fri, 1 Oct 2021 11:59:45 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> >>> Maybe it would be simpler to call xive_source_is_initialized() instead of
> >>> xive_eas_is_valid() in cases like this, e.g. hcall code since it is shared
> >>> between emulation and KVM ?
> >>
> >> Yes but we need a better check than :
> >>
> >>       if (lisn < SPAPR_XIRQ_BASE) {
> >>           return !!xive_get_field64(EAS_END_INDEX, xive->eat[lisn].w);
> >>       }
> >>
> >> This is more an heuristic than a precise test on the "validity" of
> >> a source at the KVM level.
> >>
> > 
> > I guess we should be able to get kvmppc_xive_irq_state::valid from
> > KVM by making the KVM_DEV_XIVE_GRP_SOURCE attribute readable. Would
> > that be enough ?
> 
> A bit slow may be. Or an extra bit in the XiveSource 'status' field
> to reflect KVM initialization.
> 
> C.
> 

Something like this ?

diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 42f72b68fc00..6eba58b0ff10 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -232,6 +232,7 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp)
 {
     SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
     uint64_t state = 0;
+    int ret;
 
     assert(xive->fd != -1);
 
@@ -242,8 +243,15 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp)
         }
     }
 
-    return kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE, srcno, &state,
-                             true, errp);
+    ret = kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE, srcno, &state,
+                            true, errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    xsrc->status[srcno] |= XIVE_STATUS_KVM;
+
+    return 0;
 }
 
 static int kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 445eccfe6b73..d85520beb64a 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -252,6 +252,7 @@ static inline hwaddr xive_source_esb_mgmt(XiveSource *xsrc, int srcno)
  * When doing an EOI, the Q bit will indicate if the interrupt
  * needs to be re-triggered.
  */
+#define XIVE_STATUS_KVM       0x8  /* Set when initialized in KVM */
 #define XIVE_STATUS_ASSERTED  0x4  /* Extra bit for LSI */
 #define XIVE_ESB_VAL_P        0x2
 #define XIVE_ESB_VAL_Q        0x1
Cédric Le Goater Oct. 1, 2021, 11:11 a.m. UTC | #10
On 10/1/21 12:38, Greg Kurz wrote:
> On Fri, 1 Oct 2021 11:59:45 +0200
> Cédric Le Goater <clg@kaod.org> wrote:
> 
>>>>> Maybe it would be simpler to call xive_source_is_initialized() instead of
>>>>> xive_eas_is_valid() in cases like this, e.g. hcall code since it is shared
>>>>> between emulation and KVM ?
>>>>
>>>> Yes but we need a better check than :
>>>>
>>>>        if (lisn < SPAPR_XIRQ_BASE) {
>>>>            return !!xive_get_field64(EAS_END_INDEX, xive->eat[lisn].w);
>>>>        }
>>>>
>>>> This is more an heuristic than a precise test on the "validity" of
>>>> a source at the KVM level.
>>>>
>>>
>>> I guess we should be able to get kvmppc_xive_irq_state::valid from
>>> KVM by making the KVM_DEV_XIVE_GRP_SOURCE attribute readable. Would
>>> that be enough ?
>>
>> A bit slow may be. Or an extra bit in the XiveSource 'status' field
>> to reflect KVM initialization.
>>
>> C.
>>
> 
> Something like this ?

Yes.

we need to add is/set/clr wrappers for the ASSERTED bit first. It
would clarify the code. Then add the KVM bit.

C.

> 
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 42f72b68fc00..6eba58b0ff10 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -232,6 +232,7 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp)
>   {
>       SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
>       uint64_t state = 0;
> +    int ret;
>   
>       assert(xive->fd != -1);
>   
> @@ -242,8 +243,15 @@ int kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp)
>           }
>       }
>   
> -    return kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE, srcno, &state,
> -                             true, errp);
> +    ret = kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE, srcno, &state,
> +                            true, errp);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    xsrc->status[srcno] |= XIVE_STATUS_KVM;
> +
> +    return 0;
>   }
>   
>   static int kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 445eccfe6b73..d85520beb64a 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -252,6 +252,7 @@ static inline hwaddr xive_source_esb_mgmt(XiveSource *xsrc, int srcno)
>    * When doing an EOI, the Q bit will indicate if the interrupt
>    * needs to be re-triggered.
>    */
> +#define XIVE_STATUS_KVM       0x8  /* Set when initialized in KVM */
>   #define XIVE_STATUS_ASSERTED  0x4  /* Extra bit for LSI */
>   #define XIVE_ESB_VAL_P        0x2
>   #define XIVE_ESB_VAL_Q        0x1
>
diff mbox series

Patch

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 6f31ce74f198..2dc594a720b1 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -1089,6 +1089,23 @@  static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
     if (spapr_xive_in_kernel(xive)) {
         Error *local_err = NULL;
 
+        /*
+         * Initialize the vCPU IPIs from the vCPU context to allocate
+         * the backing HW IPI on the local chip. This improves
+         * distribution of the IPIs in the system and when the vCPUs
+         * are pinned, it reduces rerouting between interrupt
+         * controllers for better performance.
+         */
+        if (lisn < SPAPR_XIRQ_BASE) {
+            XiveSource *xsrc = &xive->source;
+
+            kvmppc_xive_source_reset_one(xsrc, lisn, &local_err);
+            if (local_err) {
+                error_report_err(local_err);
+                return H_HARDWARE;
+            }
+        }
+
         kvmppc_xive_set_source_config(xive, lisn, &new_eas, &local_err);
         if (local_err) {
             error_report_err(local_err);
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 53731d158625..1607a59e9483 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -254,7 +254,12 @@  static int kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
     SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
     int i;
 
-    for (i = 0; i < xsrc->nr_irqs; i++) {
+    /*
+     * vCPU IPIs are initialized at the KVM level when configured by
+     * H_INT_SET_SOURCE_CONFIG.
+     */
+
+    for (i = SPAPR_XIRQ_BASE; i < xsrc->nr_irqs; i++) {
         int ret;
 
         if (!xive_eas_is_valid(&xive->eat[i])) {
@@ -342,6 +347,27 @@  uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
     }
 }
 
+static bool xive_source_is_initialized(SpaprXive *xive, int lisn)
+{
+    assert(lisn < xive->nr_irqs);
+
+    if (!xive_eas_is_valid(&xive->eat[lisn])) {
+        return false;
+    }
+
+    /*
+     * vCPU IPIs are initialized at the KVM level when configured by
+     * H_INT_SET_SOURCE_CONFIG, in which case, we should have a valid
+     * target (server, priority) in the END.
+     */
+    if (lisn < SPAPR_XIRQ_BASE) {
+        return !!xive_get_field64(EAS_END_INDEX, xive->eat[lisn].w);
+    }
+
+    /* Device sources */
+    return true;
+}
+
 static void kvmppc_xive_source_get_state(XiveSource *xsrc)
 {
     SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
@@ -350,7 +376,7 @@  static void kvmppc_xive_source_get_state(XiveSource *xsrc)
     for (i = 0; i < xsrc->nr_irqs; i++) {
         uint8_t pq;
 
-        if (!xive_eas_is_valid(&xive->eat[i])) {
+        if (!xive_source_is_initialized(xive, i)) {
             continue;
         }
 
@@ -533,7 +559,7 @@  static void kvmppc_xive_change_state_handler(void *opaque, bool running,
             uint8_t pq;
             uint8_t old_pq;
 
-            if (!xive_eas_is_valid(&xive->eat[i])) {
+            if (!xive_source_is_initialized(xive, i)) {
                 continue;
             }
 
@@ -561,7 +587,7 @@  static void kvmppc_xive_change_state_handler(void *opaque, bool running,
     for (i = 0; i < xsrc->nr_irqs; i++) {
         uint8_t pq;
 
-        if (!xive_eas_is_valid(&xive->eat[i])) {
+        if (!xive_source_is_initialized(xive, i)) {
             continue;
         }
 
@@ -666,7 +692,7 @@  int kvmppc_xive_post_load(SpaprXive *xive, int version_id)
 
     /* Restore the EAT */
     for (i = 0; i < xive->nr_irqs; i++) {
-        if (!xive_eas_is_valid(&xive->eat[i])) {
+        if (!xive_source_is_initialized(xive, i)) {
             continue;
         }