Message ID | 20220811153739.3079672-5-fanjinhao21s@ict.ac.cn (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hw/nvme: add irqfd support | expand |
On Aug 11 23:37, Jinhao Fan wrote: > When irqfd is enabled, we bypass QEMU's irq emulation and let KVM to > directly assert the irq. However, KVM is not aware of the device's MSI-x > masking status. Add MSI-x mask bookkeeping in NVMe emulation and > detach the corresponding irqfd when the certain vector is masked. > Did qtest work out for you for testing? If so, it would be nice to add a simple test case as well.
at 6:46 PM, Klaus Jensen <its@irrelevant.dk> wrote: > > Did qtest work out for you for testing? If so, it would be nice to add a > simple test case as well. I found MSI-x masking harder to test than we imagined. My plan is to only emulate IO queues in the IOthread and leave admin queue emulation in the main loop since some admin commands require BQL. So I didn’t enable irqfd on admin queues. Therefore we can onlyt test MSI-x masking on IO queues. This makes qtest complicated since we need to handle IO queue creation. But I’m not sure my understanding is correct. Is it true that the admin queue does not need irqfd as long as it runs in the main loop thread?
On Aug 11 23:37, Jinhao Fan wrote: > When irqfd is enabled, we bypass QEMU's irq emulation and let KVM to > directly assert the irq. However, KVM is not aware of the device's MSI-x > masking status. Add MSI-x mask bookkeeping in NVMe emulation and > detach the corresponding irqfd when the certain vector is masked. > > Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> > --- > hw/nvme/ctrl.c | 82 ++++++++++++++++++++++++++++++++++++++++++++ > hw/nvme/nvme.h | 2 ++ > hw/nvme/trace-events | 3 ++ > 3 files changed, 87 insertions(+) > > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c > index 63f988f2f9..ac5460c7c8 100644 > --- a/hw/nvme/ctrl.c > +++ b/hw/nvme/ctrl.c > @@ -7478,10 +7478,84 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset) > > return 0; > } > +static int nvme_vector_unmask(PCIDevice *pci_dev, unsigned vector, > + MSIMessage msg) > +{ > + NvmeCtrl *n = NVME(pci_dev); > + int ret; > + > + trace_pci_nvme_irq_unmask(vector, msg.address, msg.data); > + > + for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) { This loop is OK for now, but could be done better with a reverse mapping. This is just an observation. Don't spend time on doing it since we are not aware of any host actually doing masking/unmasking. > + NvmeCQueue *cq = n->cq[i]; > + /* > + * If this function is called, then irqfd must be available. Therefore, > + * irqfd must be in use if cq->assert_notifier.initialized is true. > + */ The wording indicates that you want to assert() that assumption instead. > + if (cq && cq->vector == vector && cq->assert_notifier.initialized) { > + if (cq->msg.data != msg.data || cq->msg.address != msg.address) { > + ret = kvm_irqchip_update_msi_route(kvm_state, cq->virq, msg, > + pci_dev); > + if (ret < 0) { > + return ret; > + } > + kvm_irqchip_commit_routes(kvm_state); > + cq->msg = msg; > + } > + > + ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, > + &cq->assert_notifier, > + NULL, cq->virq); > + if (ret < 0) { > + return ret; > + } > + } > + } > + > + return 0; > +} > + > +static void nvme_vector_mask(PCIDevice *pci_dev, unsigned vector) > +{ > + NvmeCtrl *n = NVME(pci_dev); > + > + trace_pci_nvme_irq_mask(vector); > + > + for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) { > + NvmeCQueue *cq = n->cq[i]; > + if (cq && cq->vector == vector && cq->assert_notifier.initialized) { > + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, > + &cq->assert_notifier, > + cq->virq); > + } > + } > +} > + > +static void nvme_vector_poll(PCIDevice *pci_dev, > + unsigned int vector_start, > + unsigned int vector_end) > +{ > + NvmeCtrl *n = NVME(pci_dev); > + > + trace_pci_nvme_irq_poll(vector_start, vector_end); > + > + for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) { > + NvmeCQueue *cq = n->cq[i]; > + if (cq && cq->vector >= vector_start && cq->vector <= vector_end > + && msix_is_masked(pci_dev, cq->vector) > + && cq->assert_notifier.initialized) { > + if (event_notifier_test_and_clear(&cq->assert_notifier)) { > + msix_set_pending(pci_dev, i); > + } > + } > + } > +} I'm a little fuzzy on this - what causes this function to be invoked?
On 8/16/2022 6:46 PM, Klaus Jensen wrote: > Did qtest work out for you for testing? If so, it would be nice to add a > simple test case as well. Since MSI-x masking handlers are only implemented for IO queues, if we want to use qtest we need to implement utilities for controller initialization and IO queue creation. After that we can actually test the MSI-x masking feature. Although we may reuse some code from virtio's tests, that is still a large amount of work. Is it possible to get this patch merged without testing? If not, I guess I'll have to take the hard work to implement something like qtest/libqos/nvme.c
On Aug 23 22:43, Jinhao Fan wrote: > On 8/16/2022 6:46 PM, Klaus Jensen wrote: > > Did qtest work out for you for testing? If so, it would be nice to add a > > simple test case as well. > > Since MSI-x masking handlers are only implemented for IO queues, if we want > to use qtest we need to implement utilities for controller initialization > and IO queue creation. After that we can actually test the MSI-x masking > feature. Although we may reuse some code from virtio's tests, that is still > a large amount of work. > > Is it possible to get this patch merged without testing? If not, I guess > I'll have to take the hard work to implement something like > qtest/libqos/nvme.c > I'm not too happy about code that is completely untestable (worse, right now it is actually not even runnable). What are the implications if we drop it? That is, if we go back to your version that did not include this? If it doesnt impact the kvm irqchip logic, then I'd rather that we rip it out and leave the device without masking/unmasking support, keeping irqfd support as an experimental feature until we can sort this out.
On 8/24/2022 7:22 PM, Klaus Jensen wrote: > What are the implications if we drop it? That is, if we go back to your > version that did not include this? If it doesnt impact the kvm irqchip > logic, then I'd rather that we rip it out and leave the device without > masking/unmasking support, keeping irqfd support as an experimental > feature until we can sort this out. As far as I can think of, the implication is that MSI-X masking/unmasking does not work when irqfd and (in the future) iothread is enabled. Considering that we do not find any driver making use of this feature, it seems OK to drop this support for now.
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 63f988f2f9..ac5460c7c8 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -7478,10 +7478,84 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset) return 0; } +static int nvme_vector_unmask(PCIDevice *pci_dev, unsigned vector, + MSIMessage msg) +{ + NvmeCtrl *n = NVME(pci_dev); + int ret; + + trace_pci_nvme_irq_unmask(vector, msg.address, msg.data); + + for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) { + NvmeCQueue *cq = n->cq[i]; + /* + * If this function is called, then irqfd must be available. Therefore, + * irqfd must be in use if cq->assert_notifier.initialized is true. + */ + if (cq && cq->vector == vector && cq->assert_notifier.initialized) { + if (cq->msg.data != msg.data || cq->msg.address != msg.address) { + ret = kvm_irqchip_update_msi_route(kvm_state, cq->virq, msg, + pci_dev); + if (ret < 0) { + return ret; + } + kvm_irqchip_commit_routes(kvm_state); + cq->msg = msg; + } + + ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + NULL, cq->virq); + if (ret < 0) { + return ret; + } + } + } + + return 0; +} + +static void nvme_vector_mask(PCIDevice *pci_dev, unsigned vector) +{ + NvmeCtrl *n = NVME(pci_dev); + + trace_pci_nvme_irq_mask(vector); + + for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) { + NvmeCQueue *cq = n->cq[i]; + if (cq && cq->vector == vector && cq->assert_notifier.initialized) { + kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, + &cq->assert_notifier, + cq->virq); + } + } +} + +static void nvme_vector_poll(PCIDevice *pci_dev, + unsigned int vector_start, + unsigned int vector_end) +{ + NvmeCtrl *n = NVME(pci_dev); + + trace_pci_nvme_irq_poll(vector_start, vector_end); + + for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) { + NvmeCQueue *cq = n->cq[i]; + if (cq && cq->vector >= vector_start && cq->vector <= vector_end + && msix_is_masked(pci_dev, cq->vector) + && cq->assert_notifier.initialized) { + if (event_notifier_test_and_clear(&cq->assert_notifier)) { + msix_set_pending(pci_dev, i); + } + } + } +} static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) { uint8_t *pci_conf = pci_dev->config; + bool with_irqfd = msix_enabled(&n->parent_obj) && + kvm_msi_via_irqfd_enabled(); uint64_t bar_size; unsigned msix_table_offset, msix_pba_offset; int ret; @@ -7534,6 +7608,13 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) } } + if (with_irqfd) { + msix_set_vector_notifiers(pci_dev, + nvme_vector_unmask, + nvme_vector_mask, + nvme_vector_poll); + } + nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize); if (n->params.cmb_size_mb) { @@ -7781,6 +7862,7 @@ static void nvme_exit(PCIDevice *pci_dev) pcie_sriov_pf_exit(pci_dev); } + msix_unset_vector_notifiers(pci_dev); msix_uninit(pci_dev, &n->bar0, &n->bar0); memory_region_del_subregion(&n->bar0, &n->iomem); } diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 85fd9cd0e2..707a55ebfc 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -20,6 +20,7 @@ #include "qemu/uuid.h" #include "hw/pci/pci.h" +#include "hw/pci/msi.h" #include "hw/block/block.h" #include "block/nvme.h" @@ -401,6 +402,7 @@ typedef struct NvmeCQueue { EventNotifier notifier; EventNotifier assert_notifier; EventNotifier deassert_notifier; + MSIMessage msg; bool first_io_cqe; bool ioeventfd_enabled; QTAILQ_HEAD(, NvmeSQueue) sq_list; diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index fccb79f489..b11fcf4a65 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -2,6 +2,9 @@ pci_nvme_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u" pci_nvme_irq_pin(void) "pulsing IRQ pin" pci_nvme_irq_masked(void) "IRQ is masked" +pci_nvme_irq_mask(uint32_t vector) "IRQ %u gets masked" +pci_nvme_irq_unmask(uint32_t vector, uint64_t addr, uint32_t data) "IRQ %u gets unmasked, addr=0x%"PRIx64" data=0x%"PRIu32"" +pci_nvme_irq_poll(uint32_t vector_start, uint32_t vector_end) "IRQ poll, start=0x%"PRIu32" end=0x%"PRIu32"" pci_nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64"" pci_nvme_dbbuf_config(uint64_t dbs_addr, uint64_t eis_addr) "dbs_addr=0x%"PRIx64" eis_addr=0x%"PRIx64"" pci_nvme_map_addr(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""
When irqfd is enabled, we bypass QEMU's irq emulation and let KVM to directly assert the irq. However, KVM is not aware of the device's MSI-x masking status. Add MSI-x mask bookkeeping in NVMe emulation and detach the corresponding irqfd when the certain vector is masked. Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> --- hw/nvme/ctrl.c | 82 ++++++++++++++++++++++++++++++++++++++++++++ hw/nvme/nvme.h | 2 ++ hw/nvme/trace-events | 3 ++ 3 files changed, 87 insertions(+)