Message ID | 20250228113237.6116-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/msi: prevent MSI entry re-writes of the same data | expand |
On 28.02.2025 12:32, Roger Pau Monne wrote: > @@ -191,8 +193,6 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg > > static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) > { > - entry->msg = *msg; > - > if ( iommu_intremap != iommu_intremap_off ) > { > int rc; > @@ -203,6 +203,20 @@ static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) > return rc; > } > > + /* > + * Avoid updating the MSI entry if the address and data fields haven't > + * changed. When using interrupt remapping changing the MSI affinity > + * shouldn't change the interrupt remapping table index, and hence the MSI > + * address and data fields should remain the same. > + */ > + if ( entry->msg.address == msg->address && entry->msg.data == msg->data ) > + { > + entry->msg.dest32 = msg->dest32; > + return 0; > + } > + > + entry->msg = *msg; It is perhaps pure luck that iommu_update_ire_from_msi() doesn't use entry's "msg" field, and hence that this re-arrangement is okay. It's unclear to me whether going forward this might not bite us. > @@ -1407,7 +1415,9 @@ int pci_restore_msi_state(struct pci_dev *pdev) > } > type = entry->msi_attrib.type; > > - msg = entry->msg; > + msg.dest32 = entry->msg.dest32; > + msi_compose_msg(desc->arch.vector, NULL, &msg); > + entry->msg = (typeof(entry->msg)){}; > write_msi_msg(entry, &msg); Hmm, this isn't exactly a "restore" then anymore. That said, re-constructing the message may even be more correct. Then, however, the question is whether passing NULL as msi_compose_msg()'s middle argument is really appropriate. A little bit of commentary may be desirable here in any event, also as to need to clear entry->msg. There's (at least) one place where behavior changes with the change of what we store in struct msi_desc's msg field (previously untranslated, now translated): dump_msi() wants to use the untranslated form. I fear it can't even re-construct some of the data it means to log (without reading from the IRTE). > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -1182,7 +1182,7 @@ static void cf_check dma_msi_end(struct irq_desc *desc, u8 vector) > static void cf_check dma_msi_set_affinity( > struct irq_desc *desc, const cpumask_t *mask) > { > - struct msi_msg msg; > + struct msi_msg msg = {}; > unsigned int dest; > unsigned long flags; > struct vtd_iommu *iommu = desc->action->dev_id; Why not a similar transformation as you do in set_msi_affinity(), eliminating the local "dest"? A change like the one here is likely needed in __hpet_setup_msi_irq(), to prevent accidental "uninitialized struct field" warnings. hpet_msi_set_affinity() might then also want to use msi_compose_msg(), albeit that may also be regarded as an independent change. Jan
On Wed, Mar 05, 2025 at 11:30:51AM +0100, Jan Beulich wrote: > On 28.02.2025 12:32, Roger Pau Monne wrote: > > @@ -191,8 +193,6 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg > > > > static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) > > { > > - entry->msg = *msg; > > - > > if ( iommu_intremap != iommu_intremap_off ) > > { > > int rc; > > @@ -203,6 +203,20 @@ static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) > > return rc; > > } > > > > + /* > > + * Avoid updating the MSI entry if the address and data fields haven't > > + * changed. When using interrupt remapping changing the MSI affinity > > + * shouldn't change the interrupt remapping table index, and hence the MSI > > + * address and data fields should remain the same. > > + */ > > + if ( entry->msg.address == msg->address && entry->msg.data == msg->data ) > > + { > > + entry->msg.dest32 = msg->dest32; > > + return 0; > > + } > > + > > + entry->msg = *msg; > > It is perhaps pure luck that iommu_update_ire_from_msi() doesn't use entry's > "msg" field, and hence that this re-arrangement is okay. It's unclear to me > whether going forward this might not bite us. I've updated the comment in msi_desc to make it clear what the `msg` field contains. If iommu_update_ire_from_msi() has a need to fetch the previous non-translated data and address fields it could always add an extra field to msi_desc. > > @@ -1407,7 +1415,9 @@ int pci_restore_msi_state(struct pci_dev *pdev) > > } > > type = entry->msi_attrib.type; > > > > - msg = entry->msg; > > + msg.dest32 = entry->msg.dest32; > > + msi_compose_msg(desc->arch.vector, NULL, &msg); > > + entry->msg = (typeof(entry->msg)){}; > > write_msi_msg(entry, &msg); > > Hmm, this isn't exactly a "restore" then anymore. That said, re-constructing > the message may even be more correct. Then, however, the question is whether > passing NULL as msi_compose_msg()'s middle argument is really appropriate. A > little bit of commentary may be desirable here in any event, also as to need > to clear entry->msg. I can add a comment. Note that as part of the patch a comment is already added to both the msi_compose_msg() prototype and definition regarding what passing a NULL cpu_mask implies. > > There's (at least) one place where behavior changes with the change of what > we store in struct msi_desc's msg field (previously untranslated, now > translated): dump_msi() wants to use the untranslated form. I fear it can't > even re-construct some of the data it means to log (without reading from > the IRTE). Oh, I've missed dump_msi(). Let me see what I can do there. Another possibility is for iommu_update_ire_from_msi() to report whether the IRTE index has changed, and so the MSI fields have been updated and need propagating to the hardware. > > --- a/xen/drivers/passthrough/vtd/iommu.c > > +++ b/xen/drivers/passthrough/vtd/iommu.c > > @@ -1182,7 +1182,7 @@ static void cf_check dma_msi_end(struct irq_desc *desc, u8 vector) > > static void cf_check dma_msi_set_affinity( > > struct irq_desc *desc, const cpumask_t *mask) > > { > > - struct msi_msg msg; > > + struct msi_msg msg = {}; > > unsigned int dest; > > unsigned long flags; > > struct vtd_iommu *iommu = desc->action->dev_id; > > Why not a similar transformation as you do in set_msi_affinity(), eliminating > the local "dest"? It was more intrusive, but I can certainly do it. > A change like the one here is likely needed in __hpet_setup_msi_irq(), to > prevent accidental "uninitialized struct field" warnings. Hm, won't the struct be fully initialized in that case, by getting passed a cpu_mask? I don't mind doing so however. > hpet_msi_set_affinity() might then also want to use msi_compose_msg(), albeit > that may also be regarded as an independent change. Possibly, note that HPET code doesn't use write_msi_msg(), and hence is only partially affected by the msi_compose_msg() change, but not the write_msi_msg() one, as it continues to store the untranslated MSI address and data fields in hpet_event_channel struct. Thanks, Roger.
On 05.03.2025 18:57, Roger Pau Monné wrote: > On Wed, Mar 05, 2025 at 11:30:51AM +0100, Jan Beulich wrote: >> On 28.02.2025 12:32, Roger Pau Monne wrote: >>> @@ -1407,7 +1415,9 @@ int pci_restore_msi_state(struct pci_dev *pdev) >>> } >>> type = entry->msi_attrib.type; >>> >>> - msg = entry->msg; >>> + msg.dest32 = entry->msg.dest32; >>> + msi_compose_msg(desc->arch.vector, NULL, &msg); >>> + entry->msg = (typeof(entry->msg)){}; >>> write_msi_msg(entry, &msg); >> >> Hmm, this isn't exactly a "restore" then anymore. That said, re-constructing >> the message may even be more correct. Then, however, the question is whether >> passing NULL as msi_compose_msg()'s middle argument is really appropriate. A >> little bit of commentary may be desirable here in any event, also as to need >> to clear entry->msg. > > I can add a comment. Note that as part of the patch a comment is > already added to both the msi_compose_msg() prototype and definition > regarding what passing a NULL cpu_mask implies. Right; the comment I'm asking for here is to explain why it's not really a restore that we do, but a re-build. >>> --- a/xen/drivers/passthrough/vtd/iommu.c >>> +++ b/xen/drivers/passthrough/vtd/iommu.c >>> @@ -1182,7 +1182,7 @@ static void cf_check dma_msi_end(struct irq_desc *desc, u8 vector) >>> static void cf_check dma_msi_set_affinity( >>> struct irq_desc *desc, const cpumask_t *mask) >>> { >>> - struct msi_msg msg; >>> + struct msi_msg msg = {}; >>> unsigned int dest; >>> unsigned long flags; >>> struct vtd_iommu *iommu = desc->action->dev_id; >> >> Why not a similar transformation as you do in set_msi_affinity(), eliminating >> the local "dest"? > > It was more intrusive, but I can certainly do it. > >> A change like the one here is likely needed in __hpet_setup_msi_irq(), to >> prevent accidental "uninitialized struct field" warnings. > > Hm, won't the struct be fully initialized in that case, by getting > passed a cpu_mask? Oh, of course. No idea what I was thinking ... > I don't mind doing so however. No need to then, I guess. Jan
diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h index 378b85ee947b..4301c58c7c4d 100644 --- a/xen/arch/x86/include/asm/msi.h +++ b/xen/arch/x86/include/asm/msi.h @@ -124,7 +124,8 @@ struct msi_desc { int irq; int remap_index; /* index in interrupt remapping table */ - struct msi_msg msg; /* Last set MSI message */ + /* Last set MSI message in remappable format if applicable */ + struct msi_msg msg; }; /* @@ -236,6 +237,8 @@ struct arch_msix { }; void early_msi_init(void); + +/* If cpu_mask is NULL msg->dest32 is used as the destination APIC ID. */ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg *msg); void __msi_set_enable(pci_sbdf_t sbdf, int pos, int enable); diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c index bf5b71822ea9..1905832be317 100644 --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -152,11 +152,13 @@ static bool msix_memory_decoded(const struct pci_dev *dev, unsigned int pos) } /* - * MSI message composition + * MSI message composition. + * If cpu_mask is NULL msg->dest32 is used as the destination APIC ID. */ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg *msg) { - memset(msg, 0, sizeof(*msg)); + msg->address = 0; + msg->data = 0; if ( vector < FIRST_DYNAMIC_VECTOR ) return; @@ -191,8 +193,6 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) { - entry->msg = *msg; - if ( iommu_intremap != iommu_intremap_off ) { int rc; @@ -203,6 +203,20 @@ static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) return rc; } + /* + * Avoid updating the MSI entry if the address and data fields haven't + * changed. When using interrupt remapping changing the MSI affinity + * shouldn't change the interrupt remapping table index, and hence the MSI + * address and data fields should remain the same. + */ + if ( entry->msg.address == msg->address && entry->msg.data == msg->data ) + { + entry->msg.dest32 = msg->dest32; + return 0; + } + + entry->msg = *msg; + switch ( entry->msi_attrib.type ) { case PCI_CAP_ID_MSI: @@ -248,21 +262,15 @@ static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) void cf_check set_msi_affinity(struct irq_desc *desc, const cpumask_t *mask) { struct msi_msg msg; - unsigned int dest; struct msi_desc *msi_desc = desc->msi_desc; - dest = set_desc_affinity(desc, mask); - if ( dest == BAD_APICID || !msi_desc ) + msg.dest32 = set_desc_affinity(desc, mask); + if ( msg.dest32 == BAD_APICID || !msi_desc ) return; ASSERT(spin_is_locked(&desc->lock)); - msg = msi_desc->msg; - msg.data &= ~MSI_DATA_VECTOR_MASK; - msg.data |= MSI_DATA_VECTOR(desc->arch.vector); - msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; - msg.address_lo |= MSI_ADDR_DEST_ID(dest); - msg.dest32 = dest; + msi_compose_msg(desc->arch.vector, NULL, &msg); write_msi_msg(msi_desc, &msg); } @@ -1407,7 +1415,9 @@ int pci_restore_msi_state(struct pci_dev *pdev) } type = entry->msi_attrib.type; - msg = entry->msg; + msg.dest32 = entry->msg.dest32; + msi_compose_msg(desc->arch.vector, NULL, &msg); + entry->msg = (typeof(entry->msg)){}; write_msi_msg(entry, &msg); for ( i = 0; ; ) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index a1927d9f126d..aa00290e7f77 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1182,7 +1182,7 @@ static void cf_check dma_msi_end(struct irq_desc *desc, u8 vector) static void cf_check dma_msi_set_affinity( struct irq_desc *desc, const cpumask_t *mask) { - struct msi_msg msg; + struct msi_msg msg = {}; unsigned int dest; unsigned long flags; struct vtd_iommu *iommu = desc->action->dev_id;
Attempt to reduce the MSI entry writes, and the associated checking whether memory decoding and MSI-X is enabled for the PCI device, when the MSI data hasn't changed. When using Interrupt Remapping the MSI entry will contain an index into the remapping table, and it's in such remapping table where the MSI vector and destination CPU is stored. As such, when using interrupt remapping, changes to the interrupt affinity shouldn't result in changes to the MSI entry, and the MSI entry update can be avoided. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Cc: Ross Lagerwall <ross.lagerwall@citrix.com> --- xen/arch/x86/include/asm/msi.h | 5 +++- xen/arch/x86/msi.c | 38 ++++++++++++++++++----------- xen/drivers/passthrough/vtd/iommu.c | 2 +- 3 files changed, 29 insertions(+), 16 deletions(-)