diff mbox series

x86/msi: prevent MSI entry re-writes of the same data

Message ID 20250228113237.6116-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series x86/msi: prevent MSI entry re-writes of the same data | expand

Commit Message

Roger Pau Monné Feb. 28, 2025, 11:32 a.m. UTC
Attempt to reduce the MSI entry writes, and the associated checking whether
memory decoding and MSI-X is enabled for the PCI device, when the MSI data
hasn't changed.

When using Interrupt Remapping the MSI entry will contain an index into
the remapping table, and it's in such remapping table where the MSI vector
and destination CPU is stored.  As such, when using interrupt remapping,
changes to the interrupt affinity shouldn't result in changes to the MSI
entry, and the MSI entry update can be avoided.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/arch/x86/include/asm/msi.h      |  5 +++-
 xen/arch/x86/msi.c                  | 38 ++++++++++++++++++-----------
 xen/drivers/passthrough/vtd/iommu.c |  2 +-
 3 files changed, 29 insertions(+), 16 deletions(-)

Comments

Jan Beulich March 5, 2025, 10:30 a.m. UTC | #1
On 28.02.2025 12:32, Roger Pau Monne wrote:
> @@ -191,8 +193,6 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
>  
>  static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
>  {
> -    entry->msg = *msg;
> -
>      if ( iommu_intremap != iommu_intremap_off )
>      {
>          int rc;
> @@ -203,6 +203,20 @@ static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
>              return rc;
>      }
>  
> +    /*
> +     * Avoid updating the MSI entry if the address and data fields haven't
> +     * changed.  When using interrupt remapping changing the MSI affinity
> +     * shouldn't change the interrupt remapping table index, and hence the MSI
> +     * address and data fields should remain the same.
> +     */
> +    if ( entry->msg.address == msg->address && entry->msg.data == msg->data )
> +    {
> +        entry->msg.dest32 = msg->dest32;
> +        return 0;
> +    }
> +
> +    entry->msg = *msg;

It is perhaps pure luck that iommu_update_ire_from_msi() doesn't use entry's
"msg" field, and hence that this re-arrangement is okay. It's unclear to me
whether going forward this might not bite us.

> @@ -1407,7 +1415,9 @@ int pci_restore_msi_state(struct pci_dev *pdev)
>          }
>          type = entry->msi_attrib.type;
>  
> -        msg = entry->msg;
> +        msg.dest32 = entry->msg.dest32;
> +        msi_compose_msg(desc->arch.vector, NULL, &msg);
> +        entry->msg = (typeof(entry->msg)){};
>          write_msi_msg(entry, &msg);

Hmm, this isn't exactly a "restore" then anymore. That said, re-constructing
the message may even be more correct. Then, however, the question is whether
passing NULL as msi_compose_msg()'s middle argument is really appropriate. A
little bit of commentary may be desirable here in any event, also as to need
to clear entry->msg.

There's (at least) one place where behavior changes with the change of what
we store in struct msi_desc's msg field (previously untranslated, now
translated): dump_msi() wants to use the untranslated form. I fear it can't
even re-construct some of the data it means to log (without reading from
the IRTE).

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1182,7 +1182,7 @@ static void cf_check dma_msi_end(struct irq_desc *desc, u8 vector)
>  static void cf_check dma_msi_set_affinity(
>      struct irq_desc *desc, const cpumask_t *mask)
>  {
> -    struct msi_msg msg;
> +    struct msi_msg msg = {};
>      unsigned int dest;
>      unsigned long flags;
>      struct vtd_iommu *iommu = desc->action->dev_id;

Why not a similar transformation as you do in set_msi_affinity(), eliminating
the local "dest"?

A change like the one here is likely needed in __hpet_setup_msi_irq(), to
prevent accidental "uninitialized struct field" warnings.
hpet_msi_set_affinity() might then also want to use msi_compose_msg(), albeit
that may also be regarded as an independent change.

Jan
Roger Pau Monné March 5, 2025, 5:57 p.m. UTC | #2
On Wed, Mar 05, 2025 at 11:30:51AM +0100, Jan Beulich wrote:
> On 28.02.2025 12:32, Roger Pau Monne wrote:
> > @@ -191,8 +193,6 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
> >  
> >  static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
> >  {
> > -    entry->msg = *msg;
> > -
> >      if ( iommu_intremap != iommu_intremap_off )
> >      {
> >          int rc;
> > @@ -203,6 +203,20 @@ static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
> >              return rc;
> >      }
> >  
> > +    /*
> > +     * Avoid updating the MSI entry if the address and data fields haven't
> > +     * changed.  When using interrupt remapping changing the MSI affinity
> > +     * shouldn't change the interrupt remapping table index, and hence the MSI
> > +     * address and data fields should remain the same.
> > +     */
> > +    if ( entry->msg.address == msg->address && entry->msg.data == msg->data )
> > +    {
> > +        entry->msg.dest32 = msg->dest32;
> > +        return 0;
> > +    }
> > +
> > +    entry->msg = *msg;
> 
> It is perhaps pure luck that iommu_update_ire_from_msi() doesn't use entry's
> "msg" field, and hence that this re-arrangement is okay. It's unclear to me
> whether going forward this might not bite us.

I've updated the comment in msi_desc to make it clear what the `msg`
field contains.  If iommu_update_ire_from_msi() has a need to fetch
the previous non-translated data and address fields it could always
add an extra field to msi_desc.

> > @@ -1407,7 +1415,9 @@ int pci_restore_msi_state(struct pci_dev *pdev)
> >          }
> >          type = entry->msi_attrib.type;
> >  
> > -        msg = entry->msg;
> > +        msg.dest32 = entry->msg.dest32;
> > +        msi_compose_msg(desc->arch.vector, NULL, &msg);
> > +        entry->msg = (typeof(entry->msg)){};
> >          write_msi_msg(entry, &msg);
> 
> Hmm, this isn't exactly a "restore" then anymore. That said, re-constructing
> the message may even be more correct. Then, however, the question is whether
> passing NULL as msi_compose_msg()'s middle argument is really appropriate. A
> little bit of commentary may be desirable here in any event, also as to need
> to clear entry->msg.

I can add a comment.  Note that as part of the patch a comment is
already added to both the msi_compose_msg() prototype and definition
regarding what passing a NULL cpu_mask implies.

> 
> There's (at least) one place where behavior changes with the change of what
> we store in struct msi_desc's msg field (previously untranslated, now
> translated): dump_msi() wants to use the untranslated form. I fear it can't
> even re-construct some of the data it means to log (without reading from
> the IRTE).

Oh, I've missed dump_msi().  Let me see what I can do there.

Another possibility is for iommu_update_ire_from_msi() to report
whether the IRTE index has changed, and so the MSI fields have been
updated and need propagating to the hardware.

> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -1182,7 +1182,7 @@ static void cf_check dma_msi_end(struct irq_desc *desc, u8 vector)
> >  static void cf_check dma_msi_set_affinity(
> >      struct irq_desc *desc, const cpumask_t *mask)
> >  {
> > -    struct msi_msg msg;
> > +    struct msi_msg msg = {};
> >      unsigned int dest;
> >      unsigned long flags;
> >      struct vtd_iommu *iommu = desc->action->dev_id;
> 
> Why not a similar transformation as you do in set_msi_affinity(), eliminating
> the local "dest"?

It was more intrusive, but I can certainly do it.

> A change like the one here is likely needed in __hpet_setup_msi_irq(), to
> prevent accidental "uninitialized struct field" warnings.

Hm, won't the struct be fully initialized in that case, by getting
passed a cpu_mask?  I don't mind doing so however.

> hpet_msi_set_affinity() might then also want to use msi_compose_msg(), albeit
> that may also be regarded as an independent change.

Possibly, note that HPET code doesn't use write_msi_msg(), and hence
is only partially affected by the msi_compose_msg() change, but not
the write_msi_msg() one, as it continues to store the untranslated MSI
address and data fields in hpet_event_channel struct.

Thanks, Roger.
Jan Beulich March 6, 2025, 10:39 a.m. UTC | #3
On 05.03.2025 18:57, Roger Pau Monné wrote:
> On Wed, Mar 05, 2025 at 11:30:51AM +0100, Jan Beulich wrote:
>> On 28.02.2025 12:32, Roger Pau Monne wrote:
>>> @@ -1407,7 +1415,9 @@ int pci_restore_msi_state(struct pci_dev *pdev)
>>>          }
>>>          type = entry->msi_attrib.type;
>>>  
>>> -        msg = entry->msg;
>>> +        msg.dest32 = entry->msg.dest32;
>>> +        msi_compose_msg(desc->arch.vector, NULL, &msg);
>>> +        entry->msg = (typeof(entry->msg)){};
>>>          write_msi_msg(entry, &msg);
>>
>> Hmm, this isn't exactly a "restore" then anymore. That said, re-constructing
>> the message may even be more correct. Then, however, the question is whether
>> passing NULL as msi_compose_msg()'s middle argument is really appropriate. A
>> little bit of commentary may be desirable here in any event, also as to need
>> to clear entry->msg.
> 
> I can add a comment.  Note that as part of the patch a comment is
> already added to both the msi_compose_msg() prototype and definition
> regarding what passing a NULL cpu_mask implies.

Right; the comment I'm asking for here is to explain why it's not really a
restore that we do, but a re-build.

>>> --- a/xen/drivers/passthrough/vtd/iommu.c
>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>>> @@ -1182,7 +1182,7 @@ static void cf_check dma_msi_end(struct irq_desc *desc, u8 vector)
>>>  static void cf_check dma_msi_set_affinity(
>>>      struct irq_desc *desc, const cpumask_t *mask)
>>>  {
>>> -    struct msi_msg msg;
>>> +    struct msi_msg msg = {};
>>>      unsigned int dest;
>>>      unsigned long flags;
>>>      struct vtd_iommu *iommu = desc->action->dev_id;
>>
>> Why not a similar transformation as you do in set_msi_affinity(), eliminating
>> the local "dest"?
> 
> It was more intrusive, but I can certainly do it.
> 
>> A change like the one here is likely needed in __hpet_setup_msi_irq(), to
>> prevent accidental "uninitialized struct field" warnings.
> 
> Hm, won't the struct be fully initialized in that case, by getting
> passed a cpu_mask?

Oh, of course. No idea what I was thinking ...

>  I don't mind doing so however.

No need to then, I guess.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
index 378b85ee947b..4301c58c7c4d 100644
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -124,7 +124,8 @@  struct msi_desc {
     int irq;
     int remap_index;         /* index in interrupt remapping table */
 
-    struct msi_msg msg;      /* Last set MSI message */
+    /* Last set MSI message in remappable format if applicable */
+    struct msi_msg msg;
 };
 
 /*
@@ -236,6 +237,8 @@  struct arch_msix {
 };
 
 void early_msi_init(void);
+
+/* If cpu_mask is NULL msg->dest32 is used as the destination APIC ID. */
 void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask,
                      struct msi_msg *msg);
 void __msi_set_enable(pci_sbdf_t sbdf, int pos, int enable);
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index bf5b71822ea9..1905832be317 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -152,11 +152,13 @@  static bool msix_memory_decoded(const struct pci_dev *dev, unsigned int pos)
 }
 
 /*
- * MSI message composition
+ * MSI message composition.
+ * If cpu_mask is NULL msg->dest32 is used as the destination APIC ID.
  */
 void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg *msg)
 {
-    memset(msg, 0, sizeof(*msg));
+    msg->address = 0;
+    msg->data = 0;
 
     if ( vector < FIRST_DYNAMIC_VECTOR )
         return;
@@ -191,8 +193,6 @@  void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
 
 static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
-    entry->msg = *msg;
-
     if ( iommu_intremap != iommu_intremap_off )
     {
         int rc;
@@ -203,6 +203,20 @@  static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
             return rc;
     }
 
+    /*
+     * Avoid updating the MSI entry if the address and data fields haven't
+     * changed.  When using interrupt remapping changing the MSI affinity
+     * shouldn't change the interrupt remapping table index, and hence the MSI
+     * address and data fields should remain the same.
+     */
+    if ( entry->msg.address == msg->address && entry->msg.data == msg->data )
+    {
+        entry->msg.dest32 = msg->dest32;
+        return 0;
+    }
+
+    entry->msg = *msg;
+
     switch ( entry->msi_attrib.type )
     {
     case PCI_CAP_ID_MSI:
@@ -248,21 +262,15 @@  static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 void cf_check set_msi_affinity(struct irq_desc *desc, const cpumask_t *mask)
 {
     struct msi_msg msg;
-    unsigned int dest;
     struct msi_desc *msi_desc = desc->msi_desc;
 
-    dest = set_desc_affinity(desc, mask);
-    if ( dest == BAD_APICID || !msi_desc )
+    msg.dest32 = set_desc_affinity(desc, mask);
+    if ( msg.dest32 == BAD_APICID || !msi_desc )
         return;
 
     ASSERT(spin_is_locked(&desc->lock));
 
-    msg = msi_desc->msg;
-    msg.data &= ~MSI_DATA_VECTOR_MASK;
-    msg.data |= MSI_DATA_VECTOR(desc->arch.vector);
-    msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
-    msg.address_lo |= MSI_ADDR_DEST_ID(dest);
-    msg.dest32 = dest;
+    msi_compose_msg(desc->arch.vector, NULL, &msg);
 
     write_msi_msg(msi_desc, &msg);
 }
@@ -1407,7 +1415,9 @@  int pci_restore_msi_state(struct pci_dev *pdev)
         }
         type = entry->msi_attrib.type;
 
-        msg = entry->msg;
+        msg.dest32 = entry->msg.dest32;
+        msi_compose_msg(desc->arch.vector, NULL, &msg);
+        entry->msg = (typeof(entry->msg)){};
         write_msi_msg(entry, &msg);
 
         for ( i = 0; ; )
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index a1927d9f126d..aa00290e7f77 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1182,7 +1182,7 @@  static void cf_check dma_msi_end(struct irq_desc *desc, u8 vector)
 static void cf_check dma_msi_set_affinity(
     struct irq_desc *desc, const cpumask_t *mask)
 {
-    struct msi_msg msg;
+    struct msi_msg msg = {};
     unsigned int dest;
     unsigned long flags;
     struct vtd_iommu *iommu = desc->action->dev_id;