[v9,20/28] ARM: GICv3: handle unmapped LPIs

Message ID	20170511175340.8448-21-andre.przywara@arm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Andre Przywara <andre.przywara@arm.com> To: Julien Grall <julien.grall@arm.com>, Stefano Stabellini <sstabellini@kernel.org> Date: Thu, 11 May 2017 18:53:32 +0100 Message-Id: <20170511175340.8448-21-andre.przywara@arm.com> In-Reply-To: <20170511175340.8448-1-andre.przywara@arm.com> References: <20170511175340.8448-1-andre.przywara@arm.com> Cc: xen-devel@lists.xenproject.org, Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>, Vijay Kilari <vijay.kilari@gmail.com>, Shanker Donthineni <shankerd@codeaurora.org> Subject: [Xen-devel] [PATCH v9 20/28] ARM: GICv3: handle unmapped LPIs Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Andre Przywara May 11, 2017, 5:53 p.m. UTC

When LPIs get unmapped by a guest, they might still be in some LR of
some VCPU. Nevertheless we remove the corresponding pending_irq
(possibly freeing it), and detect this case (irq_to_pending() returns
NULL) when the LR gets cleaned up later.
However a *new* LPI may get mapped with the same number while the old
LPI is *still* in some LR. To avoid getting the wrong state, we mark
every newly mapped LPI as PRISTINE, which means: has never been in an
LR before. If we detect the LPI in an LR anyway, it must have been an
older one, which we can simply retire.
Before inserting such a PRISTINE LPI into an LR, we must make sure that
it's not already in another LR, as the architecture forbids two
interrupts with the same virtual IRQ number on one CPU.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 xen/arch/arm/gic.c         | 55 +++++++++++++++++++++++++++++++++++++++++-----
 xen/include/asm-arm/vgic.h |  6 +++++
 2 files changed, 56 insertions(+), 5 deletions(-)

Julien Grall May 17, 2017, 6:37 p.m. UTC | #1

Hi Andre,

On 11/05/17 18:53, Andre Przywara wrote:
> When LPIs get unmapped by a guest, they might still be in some LR of
> some VCPU. Nevertheless we remove the corresponding pending_irq
> (possibly freeing it), and detect this case (irq_to_pending() returns
> NULL) when the LR gets cleaned up later.
> However a *new* LPI may get mapped with the same number while the old
> LPI is *still* in some LR. To avoid getting the wrong state, we mark
> every newly mapped LPI as PRISTINE, which means: has never been in an
> LR before. If we detect the LPI in an LR anyway, it must have been an
> older one, which we can simply retire.
> Before inserting such a PRISTINE LPI into an LR, we must make sure that
> it's not already in another LR, as the architecture forbids two
> interrupts with the same virtual IRQ number on one CPU.
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  xen/arch/arm/gic.c         | 55 +++++++++++++++++++++++++++++++++++++++++-----
>  xen/include/asm-arm/vgic.h |  6 +++++
>  2 files changed, 56 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index fd3fa05..8bf0578 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -375,6 +375,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>  {
>      ASSERT(!local_irq_is_enabled());
>
> +    clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status);
> +
>      gic_hw_ops->update_lr(lr, p, state);
>
>      set_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
> @@ -442,12 +444,41 @@ void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq)
>  #endif
>  }
>
> +/*
> + * Find an unused LR to insert an IRQ into. If this new interrupt is a
> + * PRISTINE LPI, scan the other LRs to avoid inserting the same IRQ twice.

This replicate here a part of the commit message regarding the 
pending_irq structure. So we have background in the code of why going 
through the LRs.

> + */
> +static int gic_find_unused_lr(struct vcpu *v, struct pending_irq *p, int lr)

This should be unsigned for both the return and the lr variable. Also 
please explain the goal of the variable because it is not clear from the 
callers.

> +{
> +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
> +    unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
> +    struct gic_lr lr_val;
> +
> +    ASSERT(spin_is_locked(&v->arch.vgic.lock));
> +
> +    if ( test_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> +    {
> +        int used_lr = 0;

find_next_bit return unsigned long. So likely you want to use either 
unsigned int or unsigned long here.

> +
> +        while ( (used_lr = find_next_bit(lr_mask, nr_lrs, used_lr)) < nr_lrs )
> +        {
> +            gic_hw_ops->read_lr(used_lr, &lr_val);
> +            if ( lr_val.virq == p->irq )
> +                return used_lr;
> +        }
> +    }
> +
> +    lr = find_next_zero_bit(lr_mask, nr_lrs, lr);
> +
> +    return lr;
> +}
> +
>  void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,
>          unsigned int priority)
>  {
> -    int i;
> -    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
>      struct pending_irq *p = irq_to_pending(v, virtual_irq);
> +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;

Why did you move around the variable? Please avoid such things, it is 
not like the ITS series is easy to review...

> +    int i = nr_lrs;

I don't understand why you initialize i to nr_lrs. i will always have 
the value reset by the return of gic_find_unused_lr before been used.

>
>      ASSERT(spin_is_locked(&v->arch.vgic.lock));
>
> @@ -457,7 +488,8 @@ void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,
>
>      if ( v == current && list_empty(&v->arch.vgic.lr_pending) )
>      {
> -        i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> +        i = gic_find_unused_lr(v, p, 0);
> +
>          if (i < nr_lrs) {
>              set_bit(i, &this_cpu(lr_mask));
>              gic_set_lr(i, p, GICH_LR_PENDING);
> @@ -509,7 +541,17 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>      }
>      else if ( lr_val.state & GICH_LR_PENDING )
>      {
> -        int q __attribute__ ((unused)) = test_and_clear_bit(GIC_IRQ_GUEST_QUEUED, &p->status);
> +        int q __attribute__ ((unused));
> +
> +        if ( test_and_clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> +        {
> +            gic_hw_ops->clear_lr(i);
> +            clear_bit(i, &this_cpu(lr_mask));
> +
> +            return;
> +        }

This code is very similar to what you do at the beginning when 
pending_irq is NULL. I would prefer if you stick the check there rather 
than try to address in all the different paths.

I know it is not necessary to do it for the active path as LPI does not 
have active state, though this would have require an explanation in the 
commit message). But it is easier to maintain the test_and_clear_bit in 
a single place rather than spreading in 2 different one for little benefits.

> +
> +        q = test_and_clear_bit(GIC_IRQ_GUEST_QUEUED, &p->status);
>  #ifdef GIC_DEBUG
>          if ( q )
>              gdprintk(XENLOG_DEBUG, "trying to inject irq=%d into d%dv%d, when it is already pending in LR%d\n",
> @@ -521,6 +563,9 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>          gic_hw_ops->clear_lr(i);
>          clear_bit(i, &this_cpu(lr_mask));
>
> +        if ( test_and_clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> +            return;
> +

This would avoid this check too.

>          if ( p->desc != NULL )
>              clear_bit(_IRQ_INPROGRESS, &p->desc->status);
>          clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
> @@ -591,7 +636,7 @@ static void gic_restore_pending_irqs(struct vcpu *v)
>      inflight_r = &v->arch.vgic.inflight_irqs;
>      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
>      {
> -        lr = find_next_zero_bit(&this_cpu(lr_mask), nr_lrs, lr);
> +        lr = gic_find_unused_lr(v, p, lr);
>          if ( lr >= nr_lrs )
>          {
>              /* No more free LRs: find a lower priority irq to evict */
> diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
> index 02732db..3fc4ceb 100644
> --- a/xen/include/asm-arm/vgic.h
> +++ b/xen/include/asm-arm/vgic.h
> @@ -60,12 +60,18 @@ struct pending_irq
>       * vcpu while it is still inflight and on an GICH_LR register on the
>       * old vcpu.
>       *
> +     * GIC_IRQ_GUEST_PRISTINE_LPI: the IRQ is a newly mapped LPI, which
> +     * has never been in an LR before. This means that any trace of an
> +     * LPI with the same number in an LR must be from an older LPI, which
> +     * has been unmapped before.
> +     *
>       */
>  #define GIC_IRQ_GUEST_QUEUED   0
>  #define GIC_IRQ_GUEST_ACTIVE   1
>  #define GIC_IRQ_GUEST_VISIBLE  2
>  #define GIC_IRQ_GUEST_ENABLED  3
>  #define GIC_IRQ_GUEST_MIGRATING   4
> +#define GIC_IRQ_GUEST_PRISTINE_LPI  5
>      unsigned long status;
>      struct irq_desc *desc; /* only set it the irq corresponds to a physical irq */
>      unsigned int irq;
>

Cheers,

Stefano Stabellini May 20, 2017, 1:25 a.m. UTC | #2

On Thu, 11 May 2017, Andre Przywara wrote:
> When LPIs get unmapped by a guest, they might still be in some LR of
> some VCPU. Nevertheless we remove the corresponding pending_irq
> (possibly freeing it), and detect this case (irq_to_pending() returns
> NULL) when the LR gets cleaned up later.
> However a *new* LPI may get mapped with the same number while the old
> LPI is *still* in some LR. To avoid getting the wrong state, we mark
> every newly mapped LPI as PRISTINE, which means: has never been in an
> LR before. If we detect the LPI in an LR anyway, it must have been an
> older one, which we can simply retire.
> Before inserting such a PRISTINE LPI into an LR, we must make sure that
> it's not already in another LR, as the architecture forbids two
> interrupts with the same virtual IRQ number on one CPU.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  xen/arch/arm/gic.c         | 55 +++++++++++++++++++++++++++++++++++++++++-----
>  xen/include/asm-arm/vgic.h |  6 +++++
>  2 files changed, 56 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index fd3fa05..8bf0578 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -375,6 +375,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>  {
>      ASSERT(!local_irq_is_enabled());
>  
> +    clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status);
> +
>      gic_hw_ops->update_lr(lr, p, state);
>  
>      set_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
> @@ -442,12 +444,41 @@ void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq)
>  #endif
>  }
>  
> +/*
> + * Find an unused LR to insert an IRQ into. If this new interrupt is a
> + * PRISTINE LPI, scan the other LRs to avoid inserting the same IRQ twice.
> + */
> +static int gic_find_unused_lr(struct vcpu *v, struct pending_irq *p, int lr)
> +{
> +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
> +    unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
> +    struct gic_lr lr_val;
> +
> +    ASSERT(spin_is_locked(&v->arch.vgic.lock));
> +
> +    if ( test_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )

Maybe we should add an "unlikely".

I can see how this would be OKish at runtime, but at boot time there
might be a bunch of PRISTINE_LPIs, but no MAPDs have been issued yet,
right?

I have a suggestion, I'll leave it to you and Julien if you want to do
this now, or maybe consider it as a TODO item. I am OK either way (I
don't want to delay the ITS any longer).

I am thinking we should do this scanning only after at least one MAPD
has been issued for a given cpu at least once. I would resurrect the
idea of a DISCARD flag, but not on the pending_irq, that I believe it's
difficult to handle, but a single global DISCARD flag per struct vcpu.

On MAPD, we set DISCARD for the target vcpu of the LPI we are dropping.
Next time we want to inject a PRISTINE_IRQ on that cpu interface, we
scan all LRs for interrupts with a NULL pending_irq. We remove those
from LRs, then we remove the DISCARD flag.

Do you think it would work?


> +    {
> +        int used_lr = 0;
> +
> +        while ( (used_lr = find_next_bit(lr_mask, nr_lrs, used_lr)) < nr_lrs )
> +        {
> +            gic_hw_ops->read_lr(used_lr, &lr_val);
> +            if ( lr_val.virq == p->irq )
> +                return used_lr;
> +        }
> +    }
> +
> +    lr = find_next_zero_bit(lr_mask, nr_lrs, lr);
> +
> +    return lr;
> +}
> +
>  void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,
>          unsigned int priority)
>  {
> -    int i;
> -    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
>      struct pending_irq *p = irq_to_pending(v, virtual_irq);
> +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
> +    int i = nr_lrs;
>  
>      ASSERT(spin_is_locked(&v->arch.vgic.lock));
>  
> @@ -457,7 +488,8 @@ void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,
>  
>      if ( v == current && list_empty(&v->arch.vgic.lr_pending) )
>      {
> -        i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> +        i = gic_find_unused_lr(v, p, 0);
> +
>          if (i < nr_lrs) {
>              set_bit(i, &this_cpu(lr_mask));
>              gic_set_lr(i, p, GICH_LR_PENDING);
> @@ -509,7 +541,17 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>      }
>      else if ( lr_val.state & GICH_LR_PENDING )
>      {
> -        int q __attribute__ ((unused)) = test_and_clear_bit(GIC_IRQ_GUEST_QUEUED, &p->status);
> +        int q __attribute__ ((unused));
> +
> +        if ( test_and_clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> +        {
> +            gic_hw_ops->clear_lr(i);
> +            clear_bit(i, &this_cpu(lr_mask));
> +
> +            return;
> +        }
> +
> +        q = test_and_clear_bit(GIC_IRQ_GUEST_QUEUED, &p->status);
>  #ifdef GIC_DEBUG
>          if ( q )
>              gdprintk(XENLOG_DEBUG, "trying to inject irq=%d into d%dv%d, when it is already pending in LR%d\n",
> @@ -521,6 +563,9 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>          gic_hw_ops->clear_lr(i);
>          clear_bit(i, &this_cpu(lr_mask));
>  
> +        if ( test_and_clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> +            return;
>          if ( p->desc != NULL )
>              clear_bit(_IRQ_INPROGRESS, &p->desc->status);
>          clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
> @@ -591,7 +636,7 @@ static void gic_restore_pending_irqs(struct vcpu *v)
>      inflight_r = &v->arch.vgic.inflight_irqs;
>      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
>      {
> -        lr = find_next_zero_bit(&this_cpu(lr_mask), nr_lrs, lr);
> +        lr = gic_find_unused_lr(v, p, lr);
>          if ( lr >= nr_lrs )
>          {
>              /* No more free LRs: find a lower priority irq to evict */
> diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
> index 02732db..3fc4ceb 100644
> --- a/xen/include/asm-arm/vgic.h
> +++ b/xen/include/asm-arm/vgic.h
> @@ -60,12 +60,18 @@ struct pending_irq
>       * vcpu while it is still inflight and on an GICH_LR register on the
>       * old vcpu.
>       *
> +     * GIC_IRQ_GUEST_PRISTINE_LPI: the IRQ is a newly mapped LPI, which
> +     * has never been in an LR before. This means that any trace of an
> +     * LPI with the same number in an LR must be from an older LPI, which
> +     * has been unmapped before.
> +     *
>       */
>  #define GIC_IRQ_GUEST_QUEUED   0
>  #define GIC_IRQ_GUEST_ACTIVE   1
>  #define GIC_IRQ_GUEST_VISIBLE  2
>  #define GIC_IRQ_GUEST_ENABLED  3
>  #define GIC_IRQ_GUEST_MIGRATING   4
> +#define GIC_IRQ_GUEST_PRISTINE_LPI  5
>      unsigned long status;
>      struct irq_desc *desc; /* only set it the irq corresponds to a physical irq */
>      unsigned int irq;
> -- 
> 2.9.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
>

Stefano Stabellini May 22, 2017, 11:48 p.m. UTC | #3

On Fri, 19 May 2017, Stefano Stabellini wrote:
> On Thu, 11 May 2017, Andre Przywara wrote:
> > When LPIs get unmapped by a guest, they might still be in some LR of
> > some VCPU. Nevertheless we remove the corresponding pending_irq
> > (possibly freeing it), and detect this case (irq_to_pending() returns
> > NULL) when the LR gets cleaned up later.
> > However a *new* LPI may get mapped with the same number while the old
> > LPI is *still* in some LR. To avoid getting the wrong state, we mark
> > every newly mapped LPI as PRISTINE, which means: has never been in an
> > LR before. If we detect the LPI in an LR anyway, it must have been an
> > older one, which we can simply retire.
> > Before inserting such a PRISTINE LPI into an LR, we must make sure that
> > it's not already in another LR, as the architecture forbids two
> > interrupts with the same virtual IRQ number on one CPU.
> > 
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > ---
> >  xen/arch/arm/gic.c         | 55 +++++++++++++++++++++++++++++++++++++++++-----
> >  xen/include/asm-arm/vgic.h |  6 +++++
> >  2 files changed, 56 insertions(+), 5 deletions(-)
> > 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index fd3fa05..8bf0578 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -375,6 +375,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
> >  {
> >      ASSERT(!local_irq_is_enabled());
> >  
> > +    clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status);
> > +
> >      gic_hw_ops->update_lr(lr, p, state);
> >  
> >      set_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
> > @@ -442,12 +444,41 @@ void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq)
> >  #endif
> >  }
> >  
> > +/*
> > + * Find an unused LR to insert an IRQ into. If this new interrupt is a
> > + * PRISTINE LPI, scan the other LRs to avoid inserting the same IRQ twice.
> > + */
> > +static int gic_find_unused_lr(struct vcpu *v, struct pending_irq *p, int lr)
> > +{
> > +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
> > +    unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
> > +    struct gic_lr lr_val;
> > +
> > +    ASSERT(spin_is_locked(&v->arch.vgic.lock));
> > +
> > +    if ( test_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> 
> Maybe we should add an "unlikely".
> 
> I can see how this would be OKish at runtime, but at boot time there
> might be a bunch of PRISTINE_LPIs, but no MAPDs have been issued yet,
> right?
> 
> I have a suggestion, I'll leave it to you and Julien if you want to do
> this now, or maybe consider it as a TODO item. I am OK either way (I
> don't want to delay the ITS any longer).
> 
> I am thinking we should do this scanning only after at least one MAPD
> has been issued for a given cpu at least once. I would resurrect the
> idea of a DISCARD flag, but not on the pending_irq, that I believe it's
> difficult to handle, but a single global DISCARD flag per struct vcpu.
> 
> On MAPD, we set DISCARD for the target vcpu of the LPI we are dropping.
> Next time we want to inject a PRISTINE_IRQ on that cpu interface, we
> scan all LRs for interrupts with a NULL pending_irq. We remove those
> from LRs, then we remove the DISCARD flag.
> 
> Do you think it would work?

This would need to be done not only on MAPD but also on DISCARD (and on
any other commands that would cause a pending_irq - vLPI mapping to be
dropped).


> > +    {
> > +        int used_lr = 0;
> > +
> > +        while ( (used_lr = find_next_bit(lr_mask, nr_lrs, used_lr)) < nr_lrs )
> > +        {
> > +            gic_hw_ops->read_lr(used_lr, &lr_val);
> > +            if ( lr_val.virq == p->irq )
> > +                return used_lr;
> > +        }
> > +    }
> > +
> > +    lr = find_next_zero_bit(lr_mask, nr_lrs, lr);
> > +
> > +    return lr;
> > +}
> > +
> >  void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,
> >          unsigned int priority)
> >  {
> > -    int i;
> > -    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
> >      struct pending_irq *p = irq_to_pending(v, virtual_irq);
> > +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
> > +    int i = nr_lrs;
> >  
> >      ASSERT(spin_is_locked(&v->arch.vgic.lock));
> >  
> > @@ -457,7 +488,8 @@ void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,
> >  
> >      if ( v == current && list_empty(&v->arch.vgic.lr_pending) )
> >      {
> > -        i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> > +        i = gic_find_unused_lr(v, p, 0);
> > +
> >          if (i < nr_lrs) {
> >              set_bit(i, &this_cpu(lr_mask));
> >              gic_set_lr(i, p, GICH_LR_PENDING);
> > @@ -509,7 +541,17 @@ static void gic_update_one_lr(struct vcpu *v, int i)
> >      }
> >      else if ( lr_val.state & GICH_LR_PENDING )
> >      {
> > -        int q __attribute__ ((unused)) = test_and_clear_bit(GIC_IRQ_GUEST_QUEUED, &p->status);
> > +        int q __attribute__ ((unused));
> > +
> > +        if ( test_and_clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> > +        {
> > +            gic_hw_ops->clear_lr(i);
> > +            clear_bit(i, &this_cpu(lr_mask));
> > +
> > +            return;
> > +        }
> > +
> > +        q = test_and_clear_bit(GIC_IRQ_GUEST_QUEUED, &p->status);
> >  #ifdef GIC_DEBUG
> >          if ( q )
> >              gdprintk(XENLOG_DEBUG, "trying to inject irq=%d into d%dv%d, when it is already pending in LR%d\n",
> > @@ -521,6 +563,9 @@ static void gic_update_one_lr(struct vcpu *v, int i)
> >          gic_hw_ops->clear_lr(i);
> >          clear_bit(i, &this_cpu(lr_mask));
> >  
> > +        if ( test_and_clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> > +            return;
> >          if ( p->desc != NULL )
> >              clear_bit(_IRQ_INPROGRESS, &p->desc->status);
> >          clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
> > @@ -591,7 +636,7 @@ static void gic_restore_pending_irqs(struct vcpu *v)
> >      inflight_r = &v->arch.vgic.inflight_irqs;
> >      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
> >      {
> > -        lr = find_next_zero_bit(&this_cpu(lr_mask), nr_lrs, lr);
> > +        lr = gic_find_unused_lr(v, p, lr);
> >          if ( lr >= nr_lrs )
> >          {
> >              /* No more free LRs: find a lower priority irq to evict */
> > diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
> > index 02732db..3fc4ceb 100644
> > --- a/xen/include/asm-arm/vgic.h
> > +++ b/xen/include/asm-arm/vgic.h
> > @@ -60,12 +60,18 @@ struct pending_irq
> >       * vcpu while it is still inflight and on an GICH_LR register on the
> >       * old vcpu.
> >       *
> > +     * GIC_IRQ_GUEST_PRISTINE_LPI: the IRQ is a newly mapped LPI, which
> > +     * has never been in an LR before. This means that any trace of an
> > +     * LPI with the same number in an LR must be from an older LPI, which
> > +     * has been unmapped before.
> > +     *
> >       */
> >  #define GIC_IRQ_GUEST_QUEUED   0
> >  #define GIC_IRQ_GUEST_ACTIVE   1
> >  #define GIC_IRQ_GUEST_VISIBLE  2
> >  #define GIC_IRQ_GUEST_ENABLED  3
> >  #define GIC_IRQ_GUEST_MIGRATING   4
> > +#define GIC_IRQ_GUEST_PRISTINE_LPI  5
> >      unsigned long status;
> >      struct irq_desc *desc; /* only set it the irq corresponds to a physical irq */
> >      unsigned int irq;
> > -- 
> > 2.9.0
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > https://lists.xen.org/xen-devel
> > 
>

Julien Grall May 23, 2017, 11:10 a.m. UTC | #4

Hi Stefano,

On 23/05/17 00:48, Stefano Stabellini wrote:
> On Fri, 19 May 2017, Stefano Stabellini wrote:
>> On Thu, 11 May 2017, Andre Przywara wrote:
>>> When LPIs get unmapped by a guest, they might still be in some LR of
>>> some VCPU. Nevertheless we remove the corresponding pending_irq
>>> (possibly freeing it), and detect this case (irq_to_pending() returns
>>> NULL) when the LR gets cleaned up later.
>>> However a *new* LPI may get mapped with the same number while the old
>>> LPI is *still* in some LR. To avoid getting the wrong state, we mark
>>> every newly mapped LPI as PRISTINE, which means: has never been in an
>>> LR before. If we detect the LPI in an LR anyway, it must have been an
>>> older one, which we can simply retire.
>>> Before inserting such a PRISTINE LPI into an LR, we must make sure that
>>> it's not already in another LR, as the architecture forbids two
>>> interrupts with the same virtual IRQ number on one CPU.
>>>
>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>> ---
>>>  xen/arch/arm/gic.c         | 55 +++++++++++++++++++++++++++++++++++++++++-----
>>>  xen/include/asm-arm/vgic.h |  6 +++++
>>>  2 files changed, 56 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> index fd3fa05..8bf0578 100644
>>> --- a/xen/arch/arm/gic.c
>>> +++ b/xen/arch/arm/gic.c
>>> @@ -375,6 +375,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>>>  {
>>>      ASSERT(!local_irq_is_enabled());
>>>
>>> +    clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status);
>>> +
>>>      gic_hw_ops->update_lr(lr, p, state);
>>>
>>>      set_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
>>> @@ -442,12 +444,41 @@ void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq)
>>>  #endif
>>>  }
>>>
>>> +/*
>>> + * Find an unused LR to insert an IRQ into. If this new interrupt is a
>>> + * PRISTINE LPI, scan the other LRs to avoid inserting the same IRQ twice.
>>> + */
>>> +static int gic_find_unused_lr(struct vcpu *v, struct pending_irq *p, int lr)
>>> +{
>>> +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
>>> +    unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
>>> +    struct gic_lr lr_val;
>>> +
>>> +    ASSERT(spin_is_locked(&v->arch.vgic.lock));
>>> +
>>> +    if ( test_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
>>
>> Maybe we should add an "unlikely".
>>
>> I can see how this would be OKish at runtime, but at boot time there
>> might be a bunch of PRISTINE_LPIs, but no MAPDs have been issued yet,
>> right?

You cannot have any PRISTINE_LPIs without any MAPDs done. This bit will 
be set when you do the first MAPTI.

>>
>> I have a suggestion, I'll leave it to you and Julien if you want to do
>> this now, or maybe consider it as a TODO item. I am OK either way (I
>> don't want to delay the ITS any longer).
>>
>> I am thinking we should do this scanning only after at least one MAPD
>> has been issued for a given cpu at least once. I would resurrect the
>> idea of a DISCARD flag, but not on the pending_irq, that I believe it's
>> difficult to handle, but a single global DISCARD flag per struct vcpu.
>>
>> On MAPD, we set DISCARD for the target vcpu of the LPI we are dropping.
>> Next time we want to inject a PRISTINE_IRQ on that cpu interface, we
>> scan all LRs for interrupts with a NULL pending_irq. We remove those
>> from LRs, then we remove the DISCARD flag.
>>
>> Do you think it would work?

I don't understand the point to do that. Ok, you will get the first 
PRISTINE_LPI "fast" (though likely LRs will be empty), but all the other 
will be "slow" (though likely LRs will be empty too).

The pain to implement your suggestion does not seem to be worth it so far.

Cheers,

Andre Przywara May 23, 2017, 2:41 p.m. UTC | #5

Hi Stefano,

On 20/05/17 02:25, Stefano Stabellini wrote:
> On Thu, 11 May 2017, Andre Przywara wrote:
>> When LPIs get unmapped by a guest, they might still be in some LR of
>> some VCPU. Nevertheless we remove the corresponding pending_irq
>> (possibly freeing it), and detect this case (irq_to_pending() returns
>> NULL) when the LR gets cleaned up later.
>> However a *new* LPI may get mapped with the same number while the old
>> LPI is *still* in some LR. To avoid getting the wrong state, we mark
>> every newly mapped LPI as PRISTINE, which means: has never been in an
>> LR before. If we detect the LPI in an LR anyway, it must have been an
>> older one, which we can simply retire.
>> Before inserting such a PRISTINE LPI into an LR, we must make sure that
>> it's not already in another LR, as the architecture forbids two
>> interrupts with the same virtual IRQ number on one CPU.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  xen/arch/arm/gic.c         | 55 +++++++++++++++++++++++++++++++++++++++++-----
>>  xen/include/asm-arm/vgic.h |  6 +++++
>>  2 files changed, 56 insertions(+), 5 deletions(-)
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index fd3fa05..8bf0578 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -375,6 +375,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>>  {
>>      ASSERT(!local_irq_is_enabled());
>>  
>> +    clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status);
>> +
>>      gic_hw_ops->update_lr(lr, p, state);
>>  
>>      set_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
>> @@ -442,12 +444,41 @@ void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq)
>>  #endif
>>  }
>>  
>> +/*
>> + * Find an unused LR to insert an IRQ into. If this new interrupt is a
>> + * PRISTINE LPI, scan the other LRs to avoid inserting the same IRQ twice.
>> + */
>> +static int gic_find_unused_lr(struct vcpu *v, struct pending_irq *p, int lr)
>> +{
>> +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
>> +    unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
>> +    struct gic_lr lr_val;
>> +
>> +    ASSERT(spin_is_locked(&v->arch.vgic.lock));
>> +
>> +    if ( test_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> 
> Maybe we should add an "unlikely".
> 
> I can see how this would be OKish at runtime, but at boot time there
> might be a bunch of PRISTINE_LPIs,

What is your concern here, performance?
Let's put this into perspective:
- The PRISTINE bit gets set upon MAPTI, which Linux usually does *once*
when the driver gets loaded. It gets cleared after the first injection.
- If that happens, we scan all LRs. Most implementations have 4(!) of
them (ARM GIC implementations, for instance), also the algorithm only
scans *used* LRs, so normally just one or two.
- Reading the LR is a *local* system register *read*, not an MMIO
access, and not propagated to other cores. Yes, this may be "costly"
(compared to other instructions), but it's probably still cheaper than a
page table walk (TLB miss) or L2 cache miss.

So to summarize: this is rare, iterates over only a very small number of
registers and is not hugely expensive.
At this point in time I would refrain from any kind of performance
optimization, at least unless we have solved all the other issues and
have done some benchmarking/profiling (on different hardware platforms).

> but no MAPDs have been issued yet, right?

As Julien already mentioned, this gets set after a MAPTI, which requires
a MAPD before.

Cheers,
Andre.

> I have a suggestion, I'll leave it to you and Julien if you want to do
> this now, or maybe consider it as a TODO item. I am OK either way (I
> don't want to delay the ITS any longer).
> 
> I am thinking we should do this scanning only after at least one MAPD
> has been issued for a given cpu at least once. I would resurrect the
> idea of a DISCARD flag, but not on the pending_irq, that I believe it's
> difficult to handle, but a single global DISCARD flag per struct vcpu.
> 
> On MAPD, we set DISCARD for the target vcpu of the LPI we are dropping.
> Next time we want to inject a PRISTINE_IRQ on that cpu interface, we
> scan all LRs for interrupts with a NULL pending_irq. We remove those
> from LRs, then we remove the DISCARD flag.
> 
> Do you think it would work?
> 
> 
>> +    {
>> +        int used_lr = 0;
>> +
>> +        while ( (used_lr = find_next_bit(lr_mask, nr_lrs, used_lr)) < nr_lrs )
>> +        {
>> +            gic_hw_ops->read_lr(used_lr, &lr_val);
>> +            if ( lr_val.virq == p->irq )
>> +                return used_lr;
>> +        }
>> +    }
>> +
>> +    lr = find_next_zero_bit(lr_mask, nr_lrs, lr);
>> +
>> +    return lr;
>> +}
>> +
>>  void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,
>>          unsigned int priority)
>>  {
>> -    int i;
>> -    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
>>      struct pending_irq *p = irq_to_pending(v, virtual_irq);
>> +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
>> +    int i = nr_lrs;
>>  
>>      ASSERT(spin_is_locked(&v->arch.vgic.lock));
>>  
>> @@ -457,7 +488,8 @@ void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,
>>  
>>      if ( v == current && list_empty(&v->arch.vgic.lr_pending) )
>>      {
>> -        i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
>> +        i = gic_find_unused_lr(v, p, 0);
>> +
>>          if (i < nr_lrs) {
>>              set_bit(i, &this_cpu(lr_mask));
>>              gic_set_lr(i, p, GICH_LR_PENDING);
>> @@ -509,7 +541,17 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>>      }
>>      else if ( lr_val.state & GICH_LR_PENDING )
>>      {
>> -        int q __attribute__ ((unused)) = test_and_clear_bit(GIC_IRQ_GUEST_QUEUED, &p->status);
>> +        int q __attribute__ ((unused));
>> +
>> +        if ( test_and_clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
>> +        {
>> +            gic_hw_ops->clear_lr(i);
>> +            clear_bit(i, &this_cpu(lr_mask));
>> +
>> +            return;
>> +        }
>> +
>> +        q = test_and_clear_bit(GIC_IRQ_GUEST_QUEUED, &p->status);
>>  #ifdef GIC_DEBUG
>>          if ( q )
>>              gdprintk(XENLOG_DEBUG, "trying to inject irq=%d into d%dv%d, when it is already pending in LR%d\n",
>> @@ -521,6 +563,9 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>>          gic_hw_ops->clear_lr(i);
>>          clear_bit(i, &this_cpu(lr_mask));
>>  
>> +        if ( test_and_clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
>> +            return;
>>          if ( p->desc != NULL )
>>              clear_bit(_IRQ_INPROGRESS, &p->desc->status);
>>          clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
>> @@ -591,7 +636,7 @@ static void gic_restore_pending_irqs(struct vcpu *v)
>>      inflight_r = &v->arch.vgic.inflight_irqs;
>>      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
>>      {
>> -        lr = find_next_zero_bit(&this_cpu(lr_mask), nr_lrs, lr);
>> +        lr = gic_find_unused_lr(v, p, lr);
>>          if ( lr >= nr_lrs )
>>          {
>>              /* No more free LRs: find a lower priority irq to evict */
>> diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
>> index 02732db..3fc4ceb 100644
>> --- a/xen/include/asm-arm/vgic.h
>> +++ b/xen/include/asm-arm/vgic.h
>> @@ -60,12 +60,18 @@ struct pending_irq
>>       * vcpu while it is still inflight and on an GICH_LR register on the
>>       * old vcpu.
>>       *
>> +     * GIC_IRQ_GUEST_PRISTINE_LPI: the IRQ is a newly mapped LPI, which
>> +     * has never been in an LR before. This means that any trace of an
>> +     * LPI with the same number in an LR must be from an older LPI, which
>> +     * has been unmapped before.
>> +     *
>>       */
>>  #define GIC_IRQ_GUEST_QUEUED   0
>>  #define GIC_IRQ_GUEST_ACTIVE   1
>>  #define GIC_IRQ_GUEST_VISIBLE  2
>>  #define GIC_IRQ_GUEST_ENABLED  3
>>  #define GIC_IRQ_GUEST_MIGRATING   4
>> +#define GIC_IRQ_GUEST_PRISTINE_LPI  5
>>      unsigned long status;
>>      struct irq_desc *desc; /* only set it the irq corresponds to a physical irq */
>>      unsigned int irq;
>> -- 
>> 2.9.0
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel
>>

Stefano Stabellini May 23, 2017, 6:23 p.m. UTC | #6

On Tue, 23 May 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 23/05/17 00:48, Stefano Stabellini wrote:
> > On Fri, 19 May 2017, Stefano Stabellini wrote:
> > > On Thu, 11 May 2017, Andre Przywara wrote:
> > > > When LPIs get unmapped by a guest, they might still be in some LR of
> > > > some VCPU. Nevertheless we remove the corresponding pending_irq
> > > > (possibly freeing it), and detect this case (irq_to_pending() returns
> > > > NULL) when the LR gets cleaned up later.
> > > > However a *new* LPI may get mapped with the same number while the old
> > > > LPI is *still* in some LR. To avoid getting the wrong state, we mark
> > > > every newly mapped LPI as PRISTINE, which means: has never been in an
> > > > LR before. If we detect the LPI in an LR anyway, it must have been an
> > > > older one, which we can simply retire.
> > > > Before inserting such a PRISTINE LPI into an LR, we must make sure that
> > > > it's not already in another LR, as the architecture forbids two
> > > > interrupts with the same virtual IRQ number on one CPU.
> > > > 
> > > > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > > ---
> > > >  xen/arch/arm/gic.c         | 55
> > > > +++++++++++++++++++++++++++++++++++++++++-----
> > > >  xen/include/asm-arm/vgic.h |  6 +++++
> > > >  2 files changed, 56 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > > > index fd3fa05..8bf0578 100644
> > > > --- a/xen/arch/arm/gic.c
> > > > +++ b/xen/arch/arm/gic.c
> > > > @@ -375,6 +375,8 @@ static inline void gic_set_lr(int lr, struct
> > > > pending_irq *p,
> > > >  {
> > > >      ASSERT(!local_irq_is_enabled());
> > > > 
> > > > +    clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status);
> > > > +
> > > >      gic_hw_ops->update_lr(lr, p, state);
> > > > 
> > > >      set_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
> > > > @@ -442,12 +444,41 @@ void gic_raise_inflight_irq(struct vcpu *v,
> > > > unsigned int virtual_irq)
> > > >  #endif
> > > >  }
> > > > 
> > > > +/*
> > > > + * Find an unused LR to insert an IRQ into. If this new interrupt is a
> > > > + * PRISTINE LPI, scan the other LRs to avoid inserting the same IRQ
> > > > twice.
> > > > + */
> > > > +static int gic_find_unused_lr(struct vcpu *v, struct pending_irq *p,
> > > > int lr)
> > > > +{
> > > > +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
> > > > +    unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
> > > > +    struct gic_lr lr_val;
> > > > +
> > > > +    ASSERT(spin_is_locked(&v->arch.vgic.lock));
> > > > +
> > > > +    if ( test_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> > > 
> > > Maybe we should add an "unlikely".
> > > 
> > > I can see how this would be OKish at runtime, but at boot time there
> > > might be a bunch of PRISTINE_LPIs, but no MAPDs have been issued yet,
> > > right?
> 
> You cannot have any PRISTINE_LPIs without any MAPDs done. This bit will be set
> when you do the first MAPTI.
> 
> > > 
> > > I have a suggestion, I'll leave it to you and Julien if you want to do
> > > this now, or maybe consider it as a TODO item. I am OK either way (I
> > > don't want to delay the ITS any longer).
> > > 
> > > I am thinking we should do this scanning only after at least one MAPD
> > > has been issued for a given cpu at least once. I would resurrect the
> > > idea of a DISCARD flag, but not on the pending_irq, that I believe it's
> > > difficult to handle, but a single global DISCARD flag per struct vcpu.
> > > 
> > > On MAPD, we set DISCARD for the target vcpu of the LPI we are dropping.
> > > Next time we want to inject a PRISTINE_IRQ on that cpu interface, we
> > > scan all LRs for interrupts with a NULL pending_irq. We remove those
> > > from LRs, then we remove the DISCARD flag.
> > > 
> > > Do you think it would work?
> 
> I don't understand the point to do that. Ok, you will get the first
> PRISTINE_LPI "fast" (though likely LRs will be empty), but all the other will
> be "slow" (though likely LRs will be empty too).
> 
> The pain to implement your suggestion does not seem to be worth it so far.

Let me explain it a bit better, I think I didn't clarify it well enough.
Let me also premise, that this would be fine to do later, it doesn't
have to be part of this patch.

When I wrote MAPD above, I meant actually any commands that delete an
existing pending_irq - vLPI mapping. Specifically, DISCARD, and MAPD
when the 

    if ( !valid )
        /* Discard all events and remove pending LPIs. */
        its_unmap_device(its, devid);

code path is taken, which should not be the case at boot time, right?
Are there any other commands that remove a pending_irq - vLPI mapping
that I missed?

The idea is that we could add a VGIC_V3_LPIS_DISCARD flag to arch_vcpu.
VGIC_V3_LPIS_DISCARD is set on a DISCARD command, and on a MAPD (!valid)
command. If VGIC_V3_LPIS_DISCARD is not set, there is no need to scan
anything. If VGIC_V3_LPIS_DISCARD is set *and* we want to inject a
PRISTINE_IRQ, then we do the scanning.

When we do the scanning, we check all LRs for NULL pending_irq structs.
We clear them all in one go. Then we remove VGIC_V3_LPIS_DISCARD.

This way, we get all PRISTINE_LPI fast, except for the very first one
after a DISCARD or MAPD (!valid) command.

Does it make more sense now? What do you think?

Julien Grall May 24, 2017, 9:47 a.m. UTC | #7

Hi Stefano,

On 05/23/2017 07:23 PM, Stefano Stabellini wrote:
> On Tue, 23 May 2017, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 23/05/17 00:48, Stefano Stabellini wrote:
>>> On Fri, 19 May 2017, Stefano Stabellini wrote:
>>>> On Thu, 11 May 2017, Andre Przywara wrote:
>>>>> When LPIs get unmapped by a guest, they might still be in some LR of
>>>>> some VCPU. Nevertheless we remove the corresponding pending_irq
>>>>> (possibly freeing it), and detect this case (irq_to_pending() returns
>>>>> NULL) when the LR gets cleaned up later.
>>>>> However a *new* LPI may get mapped with the same number while the old
>>>>> LPI is *still* in some LR. To avoid getting the wrong state, we mark
>>>>> every newly mapped LPI as PRISTINE, which means: has never been in an
>>>>> LR before. If we detect the LPI in an LR anyway, it must have been an
>>>>> older one, which we can simply retire.
>>>>> Before inserting such a PRISTINE LPI into an LR, we must make sure that
>>>>> it's not already in another LR, as the architecture forbids two
>>>>> interrupts with the same virtual IRQ number on one CPU.
>>>>>
>>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>>>> ---
>>>>>  xen/arch/arm/gic.c         | 55
>>>>> +++++++++++++++++++++++++++++++++++++++++-----
>>>>>  xen/include/asm-arm/vgic.h |  6 +++++
>>>>>  2 files changed, 56 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>>>> index fd3fa05..8bf0578 100644
>>>>> --- a/xen/arch/arm/gic.c
>>>>> +++ b/xen/arch/arm/gic.c
>>>>> @@ -375,6 +375,8 @@ static inline void gic_set_lr(int lr, struct
>>>>> pending_irq *p,
>>>>>  {
>>>>>      ASSERT(!local_irq_is_enabled());
>>>>>
>>>>> +    clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status);
>>>>> +
>>>>>      gic_hw_ops->update_lr(lr, p, state);
>>>>>
>>>>>      set_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
>>>>> @@ -442,12 +444,41 @@ void gic_raise_inflight_irq(struct vcpu *v,
>>>>> unsigned int virtual_irq)
>>>>>  #endif
>>>>>  }
>>>>>
>>>>> +/*
>>>>> + * Find an unused LR to insert an IRQ into. If this new interrupt is a
>>>>> + * PRISTINE LPI, scan the other LRs to avoid inserting the same IRQ
>>>>> twice.
>>>>> + */
>>>>> +static int gic_find_unused_lr(struct vcpu *v, struct pending_irq *p,
>>>>> int lr)
>>>>> +{
>>>>> +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
>>>>> +    unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
>>>>> +    struct gic_lr lr_val;
>>>>> +
>>>>> +    ASSERT(spin_is_locked(&v->arch.vgic.lock));
>>>>> +
>>>>> +    if ( test_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
>>>>
>>>> Maybe we should add an "unlikely".
>>>>
>>>> I can see how this would be OKish at runtime, but at boot time there
>>>> might be a bunch of PRISTINE_LPIs, but no MAPDs have been issued yet,
>>>> right?
>>
>> You cannot have any PRISTINE_LPIs without any MAPDs done. This bit will be set
>> when you do the first MAPTI.
>>
>>>>
>>>> I have a suggestion, I'll leave it to you and Julien if you want to do
>>>> this now, or maybe consider it as a TODO item. I am OK either way (I
>>>> don't want to delay the ITS any longer).
>>>>
>>>> I am thinking we should do this scanning only after at least one MAPD
>>>> has been issued for a given cpu at least once. I would resurrect the
>>>> idea of a DISCARD flag, but not on the pending_irq, that I believe it's
>>>> difficult to handle, but a single global DISCARD flag per struct vcpu.
>>>>
>>>> On MAPD, we set DISCARD for the target vcpu of the LPI we are dropping.
>>>> Next time we want to inject a PRISTINE_IRQ on that cpu interface, we
>>>> scan all LRs for interrupts with a NULL pending_irq. We remove those
>>>> from LRs, then we remove the DISCARD flag.
>>>>
>>>> Do you think it would work?
>>
>> I don't understand the point to do that. Ok, you will get the first
>> PRISTINE_LPI "fast" (though likely LRs will be empty), but all the other will
>> be "slow" (though likely LRs will be empty too).
>>
>> The pain to implement your suggestion does not seem to be worth it so far.
>
> Let me explain it a bit better, I think I didn't clarify it well enough.
> Let me also premise, that this would be fine to do later, it doesn't
> have to be part of this patch.
>
> When I wrote MAPD above, I meant actually any commands that delete an
> existing pending_irq - vLPI mapping. Specifically, DISCARD, and MAPD
> when the
>
>     if ( !valid )
>         /* Discard all events and remove pending LPIs. */
>         its_unmap_device(its, devid);
>
> code path is taken, which should not be the case at boot time, right?
> Are there any other commands that remove a pending_irq - vLPI mapping
> that I missed?

I don't think so.

>
> The idea is that we could add a VGIC_V3_LPIS_DISCARD flag to arch_vcpu.
> VGIC_V3_LPIS_DISCARD is set on a DISCARD command, and on a MAPD (!valid)
> command. If VGIC_V3_LPIS_DISCARD is not set, there is no need to scan
> anything. If VGIC_V3_LPIS_DISCARD is set *and* we want to inject a
> PRISTINE_IRQ, then we do the scanning.
>
> When we do the scanning, we check all LRs for NULL pending_irq structs.
> We clear them all in one go. Then we remove VGIC_V3_LPIS_DISCARD.

The problem we are trying to solve here is not because of NULL 
pending_irq structs. It is because of the previous interrupt may still 
be in LRs so we would end-up to have twice the LPIs in it. This will 
result to unpredictable behavior.

This could happen if you do:

     vCPU A               |   vCPU  B
                          |
     DISCARD vLPI1        |
     MAPTI   vLPI1        |

                 interrupt injected on vCPU B

                          |   entering in the hyp
                          |   clear_lrs
                          |   vgic_vcpu_inject_irq


clear_lrs will not remove the interrupt from LRs if it was already 
pending because pending_irq is not NULL anymore.

This is causing issue because we are trying to be clever with the LRs by 
not regenerating them at every entry/exit. This is causing trouble in 
many more place in the vGIC. IHMO we should attempt to regenerate them 
and see how this will affect the performance.

>
> This way, we get all PRISTINE_LPI fast, except for the very first one
> after a DISCARD or MAPD (!valid) command.
>
> Does it make more sense now? What do you think?

To be honest, I think we are trying to think about premature 
optimization without any number. We should first look at the vGIC rework 
and then see if this code will stay in place (I have the feeling it will 
disappear).

Cheers,

Stefano Stabellini May 24, 2017, 5:49 p.m. UTC | #8

On Wed, 24 May 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 05/23/2017 07:23 PM, Stefano Stabellini wrote:
> > On Tue, 23 May 2017, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 23/05/17 00:48, Stefano Stabellini wrote:
> > > > On Fri, 19 May 2017, Stefano Stabellini wrote:
> > > > > On Thu, 11 May 2017, Andre Przywara wrote:
> > > > > > When LPIs get unmapped by a guest, they might still be in some LR of
> > > > > > some VCPU. Nevertheless we remove the corresponding pending_irq
> > > > > > (possibly freeing it), and detect this case (irq_to_pending()
> > > > > > returns
> > > > > > NULL) when the LR gets cleaned up later.
> > > > > > However a *new* LPI may get mapped with the same number while the
> > > > > > old
> > > > > > LPI is *still* in some LR. To avoid getting the wrong state, we mark
> > > > > > every newly mapped LPI as PRISTINE, which means: has never been in
> > > > > > an
> > > > > > LR before. If we detect the LPI in an LR anyway, it must have been
> > > > > > an
> > > > > > older one, which we can simply retire.
> > > > > > Before inserting such a PRISTINE LPI into an LR, we must make sure
> > > > > > that
> > > > > > it's not already in another LR, as the architecture forbids two
> > > > > > interrupts with the same virtual IRQ number on one CPU.
> > > > > > 
> > > > > > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > > > > ---
> > > > > >  xen/arch/arm/gic.c         | 55
> > > > > > +++++++++++++++++++++++++++++++++++++++++-----
> > > > > >  xen/include/asm-arm/vgic.h |  6 +++++
> > > > > >  2 files changed, 56 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > > > > > index fd3fa05..8bf0578 100644
> > > > > > --- a/xen/arch/arm/gic.c
> > > > > > +++ b/xen/arch/arm/gic.c
> > > > > > @@ -375,6 +375,8 @@ static inline void gic_set_lr(int lr, struct
> > > > > > pending_irq *p,
> > > > > >  {
> > > > > >      ASSERT(!local_irq_is_enabled());
> > > > > > 
> > > > > > +    clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status);
> > > > > > +
> > > > > >      gic_hw_ops->update_lr(lr, p, state);
> > > > > > 
> > > > > >      set_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
> > > > > > @@ -442,12 +444,41 @@ void gic_raise_inflight_irq(struct vcpu *v,
> > > > > > unsigned int virtual_irq)
> > > > > >  #endif
> > > > > >  }
> > > > > > 
> > > > > > +/*
> > > > > > + * Find an unused LR to insert an IRQ into. If this new interrupt
> > > > > > is a
> > > > > > + * PRISTINE LPI, scan the other LRs to avoid inserting the same IRQ
> > > > > > twice.
> > > > > > + */
> > > > > > +static int gic_find_unused_lr(struct vcpu *v, struct pending_irq
> > > > > > *p,
> > > > > > int lr)
> > > > > > +{
> > > > > > +    unsigned int nr_lrs = gic_hw_ops->info->nr_lrs;
> > > > > > +    unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
> > > > > > +    struct gic_lr lr_val;
> > > > > > +
> > > > > > +    ASSERT(spin_is_locked(&v->arch.vgic.lock));
> > > > > > +
> > > > > > +    if ( test_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status) )
> > > > > 
> > > > > Maybe we should add an "unlikely".
> > > > > 
> > > > > I can see how this would be OKish at runtime, but at boot time there
> > > > > might be a bunch of PRISTINE_LPIs, but no MAPDs have been issued yet,
> > > > > right?
> > > 
> > > You cannot have any PRISTINE_LPIs without any MAPDs done. This bit will be
> > > set
> > > when you do the first MAPTI.
> > > 
> > > > > 
> > > > > I have a suggestion, I'll leave it to you and Julien if you want to do
> > > > > this now, or maybe consider it as a TODO item. I am OK either way (I
> > > > > don't want to delay the ITS any longer).
> > > > > 
> > > > > I am thinking we should do this scanning only after at least one MAPD
> > > > > has been issued for a given cpu at least once. I would resurrect the
> > > > > idea of a DISCARD flag, but not on the pending_irq, that I believe
> > > > > it's
> > > > > difficult to handle, but a single global DISCARD flag per struct vcpu.
> > > > > 
> > > > > On MAPD, we set DISCARD for the target vcpu of the LPI we are
> > > > > dropping.
> > > > > Next time we want to inject a PRISTINE_IRQ on that cpu interface, we
> > > > > scan all LRs for interrupts with a NULL pending_irq. We remove those
> > > > > from LRs, then we remove the DISCARD flag.
> > > > > 
> > > > > Do you think it would work?
> > > 
> > > I don't understand the point to do that. Ok, you will get the first
> > > PRISTINE_LPI "fast" (though likely LRs will be empty), but all the other
> > > will
> > > be "slow" (though likely LRs will be empty too).
> > > 
> > > The pain to implement your suggestion does not seem to be worth it so far.
> > 
> > Let me explain it a bit better, I think I didn't clarify it well enough.
> > Let me also premise, that this would be fine to do later, it doesn't
> > have to be part of this patch.
> > 
> > When I wrote MAPD above, I meant actually any commands that delete an
> > existing pending_irq - vLPI mapping. Specifically, DISCARD, and MAPD
> > when the
> > 
> >     if ( !valid )
> >         /* Discard all events and remove pending LPIs. */
> >         its_unmap_device(its, devid);
> > 
> > code path is taken, which should not be the case at boot time, right?
> > Are there any other commands that remove a pending_irq - vLPI mapping
> > that I missed?
> 
> I don't think so.
> 
> > 
> > The idea is that we could add a VGIC_V3_LPIS_DISCARD flag to arch_vcpu.
> > VGIC_V3_LPIS_DISCARD is set on a DISCARD command, and on a MAPD (!valid)
> > command. If VGIC_V3_LPIS_DISCARD is not set, there is no need to scan
> > anything. If VGIC_V3_LPIS_DISCARD is set *and* we want to inject a
> > PRISTINE_IRQ, then we do the scanning.
> > 
> > When we do the scanning, we check all LRs for NULL pending_irq structs.
> > We clear them all in one go. Then we remove VGIC_V3_LPIS_DISCARD.
> 
> The problem we are trying to solve here is not because of NULL pending_irq
> structs. It is because of the previous interrupt may still be in LRs so we
> would end-up to have twice the LPIs in it. This will result to unpredictable
> behavior.
> 
> This could happen if you do:
> 
>     vCPU A               |   vCPU  B
>                          |
>     DISCARD vLPI1        |
>     MAPTI   vLPI1        |
> 
>                 interrupt injected on vCPU B
> 
>                          |   entering in the hyp
>                          |   clear_lrs
>                          |   vgic_vcpu_inject_irq
> 
> 
> clear_lrs will not remove the interrupt from LRs if it was already pending
> because pending_irq is not NULL anymore.
> 
> This is causing issue because we are trying to be clever with the LRs by not
> regenerating them at every entry/exit. This is causing trouble in many more
> place in the vGIC. IHMO we should attempt to regenerate them and see how this
> will affect the performance.

Yes but if pending_irq is not NULL, then it will be marked as PRISTINE,
so it is still recognizable. We can still "clean it up".


> > This way, we get all PRISTINE_LPI fast, except for the very first one
> > after a DISCARD or MAPD (!valid) command.
> > 
> > Does it make more sense now? What do you think?
> 
> To be honest, I think we are trying to think about premature optimization
> without any number. We should first look at the vGIC rework and then see if
> this code will stay in place (I have the feeling it will disappear).
 
OK, no problem. Let's revisit this in the future.

[v9,20/28] ARM: GICv3: handle unmapped LPIs

Commit Message

Comments

Patch