diff mbox series

[v5,1/7] x86/msi: passthrough all MSI-X vector ctrl writes to device model

Message ID 3e1150560a41bd567049627d684cd4e754530869.1710342968.git-series.marmarek@invisiblethingslab.com (mailing list archive)
State Superseded
Headers show
Series MSI-X support with qemu in stubdomain, and other related changes | expand

Commit Message

Marek Marczykowski-Górecki March 13, 2024, 3:16 p.m. UTC
QEMU needs to know whether clearing maskbit of a vector is really
clearing, or was already cleared before. Currently Xen sends only
clearing that bit to the device model, but not setting it, so QEMU
cannot detect it. Because of that, QEMU is working this around by
checking via /dev/mem, but that isn't the proper approach.

Give all necessary information to QEMU by passing all ctrl writes,
including masking a vector. Advertise the new behavior via
XENVER_get_features, so QEMU can know it doesn't need to access /dev/mem
anymore.

While this commit doesn't move the whole maskbit handling to QEMU (as
discussed on xen-devel as one of the possibilities), it is a necessary
first step anyway. Including telling QEMU it will get all the required
information to do so. The actual implementation would need to include:
 - a hypercall for QEMU to control just maskbit (without (re)binding the
   interrupt again
 - a methor for QEMU to tell Xen it will actually do the work
Those are not part of this series.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
I did not added any control to enable/disable this new behavior (as
Roger have suggested for possible non-QEMU ioreqs). I don't see how the
new behavior could be problematic for some existing ioreq server (they
already received writes to those addresses, just not all of them),
but if that's really necessary, I can probably add a command line option
to restore previous behavior system-wide.

Changes in v5:
- announce the feature only on x86
- style fixes

Changes in v4:
- ignore unaligned writes with X86EMUL_OKAY
- restructure the code to forward all writes in _msixtbl_write() instead
  of manipulating return value of msixtbl_write() - this makes
  WRITE_LEN4_COMPLETION special case unnecessary
- advertise the changed behavior via XENVER_get_features instead of DMOP
v3:
 - advertise changed behavior in XEN_DMOP_get_ioreq_server_info - make
   "flags" parameter IN/OUT
 - move len check back to msixtbl_write() - will be needed there anyway
   in a later patch
v2:
 - passthrough quad writes to emulator too (Jan)
 - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
   #define for this magic value
---
 xen/arch/x86/hvm/vmsi.c       | 19 ++++++++++++++-----
 xen/common/kernel.c           |  1 +
 xen/include/public/features.h |  8 ++++++++
 3 files changed, 23 insertions(+), 5 deletions(-)

Comments

Jan Beulich April 25, 2024, 10:45 a.m. UTC | #1
On 13.03.2024 16:16, Marek Marczykowski-Górecki wrote:
> QEMU needs to know whether clearing maskbit of a vector is really
> clearing, or was already cleared before. Currently Xen sends only
> clearing that bit to the device model, but not setting it, so QEMU
> cannot detect it. Because of that, QEMU is working this around by
> checking via /dev/mem, but that isn't the proper approach.
> 
> Give all necessary information to QEMU by passing all ctrl writes,
> including masking a vector. Advertise the new behavior via
> XENVER_get_features, so QEMU can know it doesn't need to access /dev/mem
> anymore.
> 
> While this commit doesn't move the whole maskbit handling to QEMU (as
> discussed on xen-devel as one of the possibilities), it is a necessary
> first step anyway. Including telling QEMU it will get all the required
> information to do so. The actual implementation would need to include:
>  - a hypercall for QEMU to control just maskbit (without (re)binding the
>    interrupt again
>  - a methor for QEMU to tell Xen it will actually do the work
> Those are not part of this series.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> ---
> I did not added any control to enable/disable this new behavior (as
> Roger have suggested for possible non-QEMU ioreqs). I don't see how the
> new behavior could be problematic for some existing ioreq server (they
> already received writes to those addresses, just not all of them),
> but if that's really necessary, I can probably add a command line option
> to restore previous behavior system-wide.

Roger, please indicate whether you consider things to be okay to go in
as is, or whether you demand this earlier concern of yours to be addressed
by adding a command line option (or even finer-grained control).

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index adbac965f9f7..999917983789 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -283,8 +283,8 @@  static int msixtbl_write(struct vcpu *v, unsigned long address,
     unsigned long flags;
     struct irq_desc *desc;
 
-    if ( (len != 4 && len != 8) || (address & (len - 1)) )
-        return r;
+    if ( !IS_ALIGNED(address, len) )
+        return X86EMUL_OKAY;
 
     rcu_read_lock(&msixtbl_rcu_lock);
 
@@ -345,8 +345,7 @@  static int msixtbl_write(struct vcpu *v, unsigned long address,
 
 unlock:
     spin_unlock_irqrestore(&desc->lock, flags);
-    if ( len == 4 )
-        r = X86EMUL_OKAY;
+    r = X86EMUL_OKAY;
 
 out:
     rcu_read_unlock(&msixtbl_rcu_lock);
@@ -357,7 +356,17 @@  static int cf_check _msixtbl_write(
     const struct hvm_io_handler *handler, uint64_t address, uint32_t len,
     uint64_t val)
 {
-    return msixtbl_write(current, address, len, val);
+    /* Ignore invalid length or unaligned writes. */
+    if ( (len != 4 && len != 8) || !IS_ALIGNED(address, len) )
+        return X86EMUL_OKAY;
+
+    /*
+     * This function returns X86EMUL_UNHANDLEABLE even if write is properly
+     * handled, to propagate it to the device model (so it can keep its
+     * internal state in sync).
+     */
+    msixtbl_write(current, address, len, val);
+    return X86EMUL_UNHANDLEABLE;
 }
 
 static bool cf_check msixtbl_range(
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 08dbaa2a054c..b44b2439ca8e 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -637,6 +637,7 @@  long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
                 fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
                              (1U << XENFEAT_hvm_callback_vector) |
                              (has_pirq(d) ? (1U << XENFEAT_hvm_pirqs) : 0);
+            fi.submap |= (1U << XENFEAT_dm_msix_all_writes);
 #endif
             if ( !paging_mode_translate(d) || is_domain_direct_mapped(d) )
                 fi.submap |= (1U << XENFEAT_direct_mapped);
diff --git a/xen/include/public/features.h b/xen/include/public/features.h
index 4437f25d2532..880193094713 100644
--- a/xen/include/public/features.h
+++ b/xen/include/public/features.h
@@ -120,6 +120,14 @@ 
 #define XENFEAT_runstate_phys_area        18
 #define XENFEAT_vcpu_time_phys_area       19
 
+/*
+ * If set, Xen will passthrough all MSI-X vector ctrl writes to device model,
+ * not only those unmasking an entry. This allows device model to properly keep
+ * track of the MSI-X table without having to read it from the device behind
+ * Xen's backs. This information is relevant only for device models.
+ */
+#define XENFEAT_dm_msix_all_writes        20
+
 #define XENFEAT_NR_SUBMAPS 1
 
 #endif /* __XEN_PUBLIC_FEATURES_H__ */