Message ID | 20230325024924.882883-2-marmarek@invisiblethingslab.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model | expand |
On 25/03/2023 2:49 am, Marek Marczykowski-Górecki wrote: > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index 9c82bf9b4ec2..9293009a4075 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = { > .write = _msixtbl_write, > }; > > +static void __iomem *msixtbl_page_handler_get_hwaddr( > + const struct domain *d, > + uint64_t address, > + bool write) > +{ > + const struct pci_dev *pdev = NULL; > + const struct msixtbl_entry *entry; > + int adj_type; > + > + rcu_read_lock(&msixtbl_rcu_lock); > + /* > + * Check if it's on the same page as the end of the MSI-X table, but > + * outside of the table itself. > + */ > + list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list ) > + { > + if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) && > + address < entry->gtable ) > + { > + adj_type = ADJ_IDX_FIRST; > + pdev = entry->pdev; > + break; > + } > + if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) && > + address >= entry->gtable + entry->table_len ) > + { > + adj_type = ADJ_IDX_LAST; > + pdev = entry->pdev; > + break; > + } > + } > + rcu_read_unlock(&msixtbl_rcu_lock); > + > + if ( !pdev ) > + return NULL; > + > + ASSERT(pdev->msix); > + > + if ( !pdev->msix->adj_access_table_idx[adj_type] ) > + { > + gdprintk(XENLOG_WARNING, > + "Page for adjacent MSI-X table access not initialized for %pp\n", > + pdev); One minor observation. &pdev->sbdf Otherwise things will go wrong if sbdf ever moves from being the first element in a pdev. ~Andrew
On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote: > Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers > on the same page as MSI-X table. Device model (especially one in > stubdomain) cannot really handle those, as direct writes to that page is > refused (page is on the mmio_ro_ranges list). Instead, add internal ioreq > server that handle those writes. > > Doing this, requires correlating write location with guest view > of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest, > it requires msixtbl_entry->gtable, which is HVM-only. Similar feature > for PV would need to be done separately. > > This can be also used to read Pending Bit Array, if it lives on the same > page, making QEMU not needing /dev/mem access at all (especially helpful > with lockdown enabled in dom0). If PBA lives on another page, QEMU will > map it to the guest directly. > If PBA lives on the same page, forbid writes and log a message. > Technically, writes outside of PBA could be allowed, but at this moment > the precise location of PBA isn't saved. > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > --- > v2: > - adjust commit message > - pass struct domain to msixtbl_page_handler_get_hwaddr() > - reduce local variables used only once > - log a warning if write is forbidden if MSI-X and PBA lives on the same > page > - do not passthrough unaligned accesses > - handle accesses both before and after MSI-X table > --- > xen/arch/x86/hvm/vmsi.c | 154 +++++++++++++++++++++++++++++++++ > xen/arch/x86/include/asm/msi.h | 5 ++ > xen/arch/x86/msi.c | 38 ++++++++ > 3 files changed, 197 insertions(+) > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > index 9c82bf9b4ec2..9293009a4075 100644 > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = { > .write = _msixtbl_write, > }; > > +static void __iomem *msixtbl_page_handler_get_hwaddr( > + const struct domain *d, > + uint64_t address, > + bool write) > +{ > + const struct pci_dev *pdev = NULL; > + const struct msixtbl_entry *entry; > + int adj_type; unsigned AFAICT. > + > + rcu_read_lock(&msixtbl_rcu_lock); > + /* > + * Check if it's on the same page as the end of the MSI-X table, but > + * outside of the table itself. > + */ > + list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list ) > + { > + if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) && > + address < entry->gtable ) > + { > + adj_type = ADJ_IDX_FIRST; > + pdev = entry->pdev; > + break; > + } > + if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) && Should be entry->gtable + entry->table_len - 1, or else you are off-by-one. > + address >= entry->gtable + entry->table_len ) > + { > + adj_type = ADJ_IDX_LAST; > + pdev = entry->pdev; > + break; > + } > + } > + rcu_read_unlock(&msixtbl_rcu_lock); > + > + if ( !pdev ) > + return NULL; > + > + ASSERT(pdev->msix); > + > + if ( !pdev->msix->adj_access_table_idx[adj_type] ) > + { > + gdprintk(XENLOG_WARNING, > + "Page for adjacent MSI-X table access not initialized for %pp\n", > + pdev); > + > + return NULL; > + } > + > + /* If PBA lives on the same page too, forbid writes. */ > + if ( write && ( > + (adj_type == ADJ_IDX_LAST && > + pdev->msix->table.last == pdev->msix->pba.first) || > + (adj_type == ADJ_IDX_FIRST && > + pdev->msix->table.first == pdev->msix->pba.last)) ) > + { > + gdprintk(XENLOG_WARNING, > + "MSI-X table and PBA of %pp live on the same page, " > + "writing to other registers there is not implemented\n", > + pdev); Don't you actually need to check that the passed address falls into the PBA array? PBA can be sharing the same page as the MSI-X table, but that doesn't imply there aren't holes also not used by either the PBA or the MSI-X table in such page. > + return NULL; > + } > + > + return fix_to_virt(pdev->msix->adj_access_table_idx[adj_type]) + > + (address & (PAGE_SIZE - 1)); You can use PAGE_OFFSET(). > + > +} > + > +static bool cf_check msixtbl_page_accept( > + const struct hvm_io_handler *handler, const ioreq_t *r) > +{ > + ASSERT(r->type == IOREQ_TYPE_COPY); > + > + return msixtbl_page_handler_get_hwaddr( > + current->domain, r->addr, r->dir == IOREQ_WRITE); I think you want to accept it also if it's a write to the PBA, and just drop it. You should always pass write=false and then drop it in msixtbl_page_write() if it falls in the PBA region (but still return X86EMUL_OKAY). > +} > + > +static int cf_check msixtbl_page_read( > + const struct hvm_io_handler *handler, > + uint64_t address, uint32_t len, uint64_t *pval) Why use hvm_io_ops and not hvm_mmio_ops? You only care about MMIO accesses here. > +{ > + void __iomem *hwaddr; const I would also initialize *pval = ~0ul for safety. > + > + if ( address & (len - 1) ) > + return X86EMUL_UNHANDLEABLE; You can use IS_ALIGNED for clarity. In my fix for this for vPCI Jan asked to split unaligned accesses into 1byte ones, but I think for guests it's better to just drop them unless there's a specific case that we need to handle. Also you should return X86EMUL_OKAY here, as the address is handled here, but the current way to handle it is to drop the access. Printing a debug message can be helpful in case unaligned accesses need to be implemented in order to support some device. > + > + hwaddr = msixtbl_page_handler_get_hwaddr( > + current->domain, address, false); > + > + if ( !hwaddr ) > + return X86EMUL_UNHANDLEABLE; Maybe X86EMUL_RETRY, since the page was there in the accept handler. > + > + switch ( len ) > + { > + case 1: > + *pval = readb(hwaddr); > + break; > + case 2: > + *pval = readw(hwaddr); > + break; > + case 4: > + *pval = readl(hwaddr); > + break; > + case 8: > + *pval = readq(hwaddr); > + break; Nit: we usually add a newline after every break; > + default: > + return X86EMUL_UNHANDLEABLE; I would rather use ASSERT_UNREACHABLE() here and fallthrough to the return below. Seeing such size is likely an indication of something else gone very wrong, better to just terminate the access at once. > + } > + return X86EMUL_OKAY; > +} > + > +static int cf_check msixtbl_page_write( > + const struct hvm_io_handler *handler, > + uint64_t address, uint32_t len, uint64_t val) > +{ > + void __iomem *hwaddr = msixtbl_page_handler_get_hwaddr( > + current->domain, address, true); > + You don't seem to check whether the access is aligned here? Otherwise I have mostly the same comments as in msixtbl_page_read(). > + if ( !hwaddr ) > + return X86EMUL_UNHANDLEABLE; > + > + switch ( len ) { > + case 1: > + writeb(val, hwaddr); > + break; > + case 2: > + writew(val, hwaddr); > + break; > + case 4: > + writel(val, hwaddr); > + break; > + case 8: > + writeq(val, hwaddr); > + break; > + default: > + return X86EMUL_UNHANDLEABLE; > + } > + return X86EMUL_OKAY; > + > +} > + > +static const struct hvm_io_ops msixtbl_mmio_page_ops = { > + .accept = msixtbl_page_accept, > + .read = msixtbl_page_read, > + .write = msixtbl_page_write, > +}; > + > static void add_msixtbl_entry(struct domain *d, > struct pci_dev *pdev, > uint64_t gtable, > @@ -593,6 +739,14 @@ void msixtbl_init(struct domain *d) > handler->type = IOREQ_TYPE_COPY; > handler->ops = &msixtbl_mmio_ops; > } > + > + /* passthrough access to other registers on the same page */ > + handler = hvm_next_io_handler(d); > + if ( handler ) > + { > + handler->type = IOREQ_TYPE_COPY; > + handler->ops = &msixtbl_mmio_page_ops; > + } > } > > void msixtbl_pt_cleanup(struct domain *d) > diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h > index a53ade95c9ad..d13cf1c1f873 100644 > --- a/xen/arch/x86/include/asm/msi.h > +++ b/xen/arch/x86/include/asm/msi.h > @@ -207,6 +207,10 @@ struct msg_address { > PCI_MSIX_ENTRY_SIZE + \ > (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1))) > > +/* indexes in adj_access_table_idx[] below */ > +#define ADJ_IDX_FIRST 0 > +#define ADJ_IDX_LAST 1 > + > struct arch_msix { > unsigned int nr_entries, used_entries; > struct { > @@ -214,6 +218,7 @@ struct arch_msix { > } table, pba; > int table_refcnt[MAX_MSIX_TABLE_PAGES]; > int table_idx[MAX_MSIX_TABLE_PAGES]; > + int adj_access_table_idx[2]; > spinlock_t table_lock; > bool host_maskall, guest_maskall; > domid_t warned; > diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c > index d0bf63df1def..680853f84685 100644 > --- a/xen/arch/x86/msi.c > +++ b/xen/arch/x86/msi.c > @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev, > domain_crash(d); > /* XXX How to deal with existing mappings? */ > } > + > + /* > + * If the MSI-X table doesn't start at the page boundary, map the first page for > + * passthrough accesses. > + */ I think you should initialize msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1? > + if ( table_paddr & (PAGE_SIZE - 1) ) > + { > + int idx = msix_get_fixmap(msix, table_paddr, table_paddr); > + > + if ( idx >= 0 ) > + msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx; > + else > + gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: %d\n", idx); > + } > + /* > + * If the MSI-X table doesn't span full page(s), map the last page for > + * passthrough accesses. > + */ > + if ( (table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) ) > + { > + uint64_t entry_paddr = table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE; > + int idx = msix_get_fixmap(msix, table_paddr, entry_paddr); > + > + if ( idx >= 0 ) > + msix->adj_access_table_idx[ADJ_IDX_LAST] = idx; > + else > + gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: %d\n", idx); > + } > } > WARN_ON(msix->table.first != (table_paddr >> PAGE_SHIFT)); > ++msix->used_entries; > @@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix) > WARN(); > msix->table.first = 0; > msix->table.last = 0; > + if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] ) > + { > + msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]); > + msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0; > + } > + if ( msix->adj_access_table_idx[ADJ_IDX_LAST] ) > + { > + msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]); > + msix->adj_access_table_idx[ADJ_IDX_LAST] = 0; Isn't 0 a valid idx? You should check for >= 0 and also set to -1 once the fixmap entry has been put. Thanks, Roger.
On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote: > On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote: > > Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers > > on the same page as MSI-X table. Device model (especially one in > > stubdomain) cannot really handle those, as direct writes to that page is > > refused (page is on the mmio_ro_ranges list). Instead, add internal ioreq > > server that handle those writes. > > > > Doing this, requires correlating write location with guest view > > of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest, > > it requires msixtbl_entry->gtable, which is HVM-only. Similar feature > > for PV would need to be done separately. > > > > This can be also used to read Pending Bit Array, if it lives on the same > > page, making QEMU not needing /dev/mem access at all (especially helpful > > with lockdown enabled in dom0). If PBA lives on another page, QEMU will > > map it to the guest directly. > > If PBA lives on the same page, forbid writes and log a message. > > Technically, writes outside of PBA could be allowed, but at this moment > > the precise location of PBA isn't saved. > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > --- > > v2: > > - adjust commit message > > - pass struct domain to msixtbl_page_handler_get_hwaddr() > > - reduce local variables used only once > > - log a warning if write is forbidden if MSI-X and PBA lives on the same > > page > > - do not passthrough unaligned accesses > > - handle accesses both before and after MSI-X table > > --- > > xen/arch/x86/hvm/vmsi.c | 154 +++++++++++++++++++++++++++++++++ > > xen/arch/x86/include/asm/msi.h | 5 ++ > > xen/arch/x86/msi.c | 38 ++++++++ > > 3 files changed, 197 insertions(+) > > > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > > index 9c82bf9b4ec2..9293009a4075 100644 > > --- a/xen/arch/x86/hvm/vmsi.c > > +++ b/xen/arch/x86/hvm/vmsi.c > > @@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = { > > .write = _msixtbl_write, > > }; > > > > +static void __iomem *msixtbl_page_handler_get_hwaddr( > > + const struct domain *d, > > + uint64_t address, > > + bool write) > > +{ > > + const struct pci_dev *pdev = NULL; > > + const struct msixtbl_entry *entry; > > + int adj_type; > > unsigned AFAICT. Ok. > > + > > + rcu_read_lock(&msixtbl_rcu_lock); > > + /* > > + * Check if it's on the same page as the end of the MSI-X table, but > > + * outside of the table itself. > > + */ > > + list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list ) > > + { > > + if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) && > > + address < entry->gtable ) > > + { > > + adj_type = ADJ_IDX_FIRST; > > + pdev = entry->pdev; > > + break; > > + } > > + if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) && > > Should be entry->gtable + entry->table_len - 1, or else you are > off-by-one. Ok. > > + address >= entry->gtable + entry->table_len ) > > + { > > + adj_type = ADJ_IDX_LAST; > > + pdev = entry->pdev; > > + break; > > + } > > + } > > + rcu_read_unlock(&msixtbl_rcu_lock); > > + > > + if ( !pdev ) > > + return NULL; > > + > > + ASSERT(pdev->msix); > > + > > + if ( !pdev->msix->adj_access_table_idx[adj_type] ) > > + { > > + gdprintk(XENLOG_WARNING, > > + "Page for adjacent MSI-X table access not initialized for %pp\n", > > + pdev); > > + > > + return NULL; > > + } > > + > > + /* If PBA lives on the same page too, forbid writes. */ > > + if ( write && ( > > + (adj_type == ADJ_IDX_LAST && > > + pdev->msix->table.last == pdev->msix->pba.first) || > > + (adj_type == ADJ_IDX_FIRST && > > + pdev->msix->table.first == pdev->msix->pba.last)) ) > > + { > > + gdprintk(XENLOG_WARNING, > > + "MSI-X table and PBA of %pp live on the same page, " > > + "writing to other registers there is not implemented\n", > > + pdev); > > Don't you actually need to check that the passed address falls into the > PBA array? PBA can be sharing the same page as the MSI-X table, but > that doesn't imply there aren't holes also not used by either the PBA > or the MSI-X table in such page. I don't know exact location of PBA, so I'm rejecting writes just in case they do hit PBA (see commit message). > > + return NULL; > > + } > > + > > + return fix_to_virt(pdev->msix->adj_access_table_idx[adj_type]) + > > + (address & (PAGE_SIZE - 1)); > > You can use PAGE_OFFSET(). Ok. > > + > > +} > > + > > +static bool cf_check msixtbl_page_accept( > > + const struct hvm_io_handler *handler, const ioreq_t *r) > > +{ > > + ASSERT(r->type == IOREQ_TYPE_COPY); > > + > > + return msixtbl_page_handler_get_hwaddr( > > + current->domain, r->addr, r->dir == IOREQ_WRITE); > > I think you want to accept it also if it's a write to the PBA, and > just drop it. You should always pass write=false and then drop it in > msixtbl_page_write() if it falls in the PBA region (but still return > X86EMUL_OKAY). I don't want to interfere with msixtbl_mmio_page_ops, this handler is only about accesses not hitting actual MSI-X structures. > > +} > > + > > +static int cf_check msixtbl_page_read( > > + const struct hvm_io_handler *handler, > > + uint64_t address, uint32_t len, uint64_t *pval) > > Why use hvm_io_ops and not hvm_mmio_ops? You only care about MMIO > accesses here. I followed how msixtbl_mmio_ops are registered. Should that also be changed to hvm_mmio_ops? > > > +{ > > + void __iomem *hwaddr; > > const > > I would also initialize *pval = ~0ul for safety. When returning X86EMUL_OKAY, then I agree. > > + > > + if ( address & (len - 1) ) > > + return X86EMUL_UNHANDLEABLE; > > You can use IS_ALIGNED for clarity. In my fix for this for vPCI Jan > asked to split unaligned accesses into 1byte ones, but I think for > guests it's better to just drop them unless there's a specific case > that we need to handle. > > Also you should return X86EMUL_OKAY here, as the address is handled > here, but the current way to handle it is to drop the access. > > Printing a debug message can be helpful in case unaligned accesses > need to be implemented in order to support some device. > > > + > > + hwaddr = msixtbl_page_handler_get_hwaddr( > > + current->domain, address, false); > > + > > + if ( !hwaddr ) > > + return X86EMUL_UNHANDLEABLE; > > Maybe X86EMUL_RETRY, since the page was there in the accept handler. How does X86EMUL_RETRY work? Is it trying to find the handler again? > > + > > + switch ( len ) > > + { > > + case 1: > > + *pval = readb(hwaddr); > > + break; > > + case 2: > > + *pval = readw(hwaddr); > > + break; > > + case 4: > > + *pval = readl(hwaddr); > > + break; > > + case 8: > > + *pval = readq(hwaddr); > > + break; > > Nit: we usually add a newline after every break; Ok. > > + default: > > + return X86EMUL_UNHANDLEABLE; > > I would rather use ASSERT_UNREACHABLE() here and fallthrough to the > return below. Seeing such size is likely an indication of something > else gone very wrong, better to just terminate the access at once. > > > + } > > + return X86EMUL_OKAY; > > +} > > + > > +static int cf_check msixtbl_page_write( > > + const struct hvm_io_handler *handler, > > + uint64_t address, uint32_t len, uint64_t val) > > +{ > > + void __iomem *hwaddr = msixtbl_page_handler_get_hwaddr( > > + current->domain, address, true); > > + > > You don't seem to check whether the access is aligned here? > > Otherwise I have mostly the same comments as in msixtbl_page_read(). Ok. > > + if ( !hwaddr ) > > + return X86EMUL_UNHANDLEABLE; > > + > > + switch ( len ) { > > + case 1: > > + writeb(val, hwaddr); > > + break; > > + case 2: > > + writew(val, hwaddr); > > + break; > > + case 4: > > + writel(val, hwaddr); > > + break; > > + case 8: > > + writeq(val, hwaddr); > > + break; > > + default: > > + return X86EMUL_UNHANDLEABLE; > > + } > > + return X86EMUL_OKAY; > > + > > +} > > + > > +static const struct hvm_io_ops msixtbl_mmio_page_ops = { > > + .accept = msixtbl_page_accept, > > + .read = msixtbl_page_read, > > + .write = msixtbl_page_write, > > +}; > > + > > static void add_msixtbl_entry(struct domain *d, > > struct pci_dev *pdev, > > uint64_t gtable, > > @@ -593,6 +739,14 @@ void msixtbl_init(struct domain *d) > > handler->type = IOREQ_TYPE_COPY; > > handler->ops = &msixtbl_mmio_ops; > > } > > + > > + /* passthrough access to other registers on the same page */ > > + handler = hvm_next_io_handler(d); > > + if ( handler ) > > + { > > + handler->type = IOREQ_TYPE_COPY; > > + handler->ops = &msixtbl_mmio_page_ops; > > + } > > } > > > > void msixtbl_pt_cleanup(struct domain *d) > > diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h > > index a53ade95c9ad..d13cf1c1f873 100644 > > --- a/xen/arch/x86/include/asm/msi.h > > +++ b/xen/arch/x86/include/asm/msi.h > > @@ -207,6 +207,10 @@ struct msg_address { > > PCI_MSIX_ENTRY_SIZE + \ > > (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1))) > > > > +/* indexes in adj_access_table_idx[] below */ > > +#define ADJ_IDX_FIRST 0 > > +#define ADJ_IDX_LAST 1 > > + > > struct arch_msix { > > unsigned int nr_entries, used_entries; > > struct { > > @@ -214,6 +218,7 @@ struct arch_msix { > > } table, pba; > > int table_refcnt[MAX_MSIX_TABLE_PAGES]; > > int table_idx[MAX_MSIX_TABLE_PAGES]; > > + int adj_access_table_idx[2]; > > spinlock_t table_lock; > > bool host_maskall, guest_maskall; > > domid_t warned; > > diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c > > index d0bf63df1def..680853f84685 100644 > > --- a/xen/arch/x86/msi.c > > +++ b/xen/arch/x86/msi.c > > @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev, > > domain_crash(d); > > /* XXX How to deal with existing mappings? */ > > } > > + > > + /* > > + * If the MSI-X table doesn't start at the page boundary, map the first page for > > + * passthrough accesses. > > + */ > > I think you should initialize > msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1? > > > + if ( table_paddr & (PAGE_SIZE - 1) ) > > + { > > + int idx = msix_get_fixmap(msix, table_paddr, table_paddr); > > + > > + if ( idx >= 0 ) > > + msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx; > > + else > > + gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: %d\n", idx); > > + } > > + /* > > + * If the MSI-X table doesn't span full page(s), map the last page for > > + * passthrough accesses. > > + */ > > + if ( (table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) ) > > + { > > + uint64_t entry_paddr = table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE; > > + int idx = msix_get_fixmap(msix, table_paddr, entry_paddr); > > + > > + if ( idx >= 0 ) > > + msix->adj_access_table_idx[ADJ_IDX_LAST] = idx; > > + else > > + gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: %d\n", idx); > > + } > > } > > WARN_ON(msix->table.first != (table_paddr >> PAGE_SHIFT)); > > ++msix->used_entries; > > @@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix) > > WARN(); > > msix->table.first = 0; > > msix->table.last = 0; > > + if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] ) > > + { > > + msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]); > > + msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0; > > + } > > + if ( msix->adj_access_table_idx[ADJ_IDX_LAST] ) > > + { > > + msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]); > > + msix->adj_access_table_idx[ADJ_IDX_LAST] = 0; > > Isn't 0 a valid idx? You should check for >= 0 and also set > to -1 once the fixmap entry has been put. I rely here on msix using specific range of fixmap indexes (FIX_MSIX_TO_RESERV_BASE - FIX_MSIX_TO_RESERV_END), which isn't starting at 0. For this reason also, I keep unused entries at 0 (no need explicit initialization).
On 28.03.2023 14:05, Marek Marczykowski-Górecki wrote: > On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote: >> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote: >>> +static bool cf_check msixtbl_page_accept( >>> + const struct hvm_io_handler *handler, const ioreq_t *r) >>> +{ >>> + ASSERT(r->type == IOREQ_TYPE_COPY); >>> + >>> + return msixtbl_page_handler_get_hwaddr( >>> + current->domain, r->addr, r->dir == IOREQ_WRITE); >> >> I think you want to accept it also if it's a write to the PBA, and >> just drop it. You should always pass write=false and then drop it in >> msixtbl_page_write() if it falls in the PBA region (but still return >> X86EMUL_OKAY). > > I don't want to interfere with msixtbl_mmio_page_ops, this handler is > only about accesses not hitting actual MSI-X structures. In his functionally similar vPCI change I did ask Roger to handle the "extra" space right from the same handlers. Maybe that's going to be best here, too. >>> + hwaddr = msixtbl_page_handler_get_hwaddr( >>> + current->domain, address, false); >>> + >>> + if ( !hwaddr ) >>> + return X86EMUL_UNHANDLEABLE; >> >> Maybe X86EMUL_RETRY, since the page was there in the accept handler. > > How does X86EMUL_RETRY work? Is it trying to find the handler again? It exits back to guest context, to restart the insn in question. Then the full emulation path may be re-triggered. If the region was removed from the guest, a handler then won't be found, and the access handed to the device model. >>> --- a/xen/arch/x86/msi.c >>> +++ b/xen/arch/x86/msi.c >>> @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev, >>> domain_crash(d); >>> /* XXX How to deal with existing mappings? */ >>> } >>> + >>> + /* >>> + * If the MSI-X table doesn't start at the page boundary, map the first page for >>> + * passthrough accesses. >>> + */ >> >> I think you should initialize >> msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1? Or better not use a signed type there and set to UINT_MAX here. Jan
On Tue, Mar 28, 2023 at 02:34:23PM +0200, Jan Beulich wrote: > On 28.03.2023 14:05, Marek Marczykowski-Górecki wrote: > > On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote: > >> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote: > >>> +static bool cf_check msixtbl_page_accept( > >>> + const struct hvm_io_handler *handler, const ioreq_t *r) > >>> +{ > >>> + ASSERT(r->type == IOREQ_TYPE_COPY); > >>> + > >>> + return msixtbl_page_handler_get_hwaddr( > >>> + current->domain, r->addr, r->dir == IOREQ_WRITE); > >> > >> I think you want to accept it also if it's a write to the PBA, and > >> just drop it. You should always pass write=false and then drop it in > >> msixtbl_page_write() if it falls in the PBA region (but still return > >> X86EMUL_OKAY). > > > > I don't want to interfere with msixtbl_mmio_page_ops, this handler is > > only about accesses not hitting actual MSI-X structures. > > In his functionally similar vPCI change I did ask Roger to handle the > "extra" space right from the same handlers. Maybe that's going to be > best here, too. I have considered this option, but msixtbl_range() is already quite complex, adding yet another case there won't make it easier to follow. I mean, technically I can probably merge those two handlers together, but I don't think it will result in nicer code. Especially since the general direction is to abandon split of MSI-X table access handling between Xen and QEMU and go with just QEMU doing it, hopefully at some point not needing msixtbl_mmio_ops anymore (but still needing the one for adjacent accesses). > >>> --- a/xen/arch/x86/msi.c > >>> +++ b/xen/arch/x86/msi.c > >>> @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev, > >>> domain_crash(d); > >>> /* XXX How to deal with existing mappings? */ > >>> } > >>> + > >>> + /* > >>> + * If the MSI-X table doesn't start at the page boundary, map the first page for > >>> + * passthrough accesses. > >>> + */ > >> > >> I think you should initialize > >> msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1? > > Or better not use a signed type there and set to UINT_MAX here. If not using 0 as unused entry (see the other commend I made in response to Roger), then that's probably the way to go.
On Tue, Mar 28, 2023 at 02:05:14PM +0200, Marek Marczykowski-Górecki wrote: > On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote: > > On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote: > > > + address >= entry->gtable + entry->table_len ) > > > + { > > > + adj_type = ADJ_IDX_LAST; > > > + pdev = entry->pdev; > > > + break; > > > + } > > > + } > > > + rcu_read_unlock(&msixtbl_rcu_lock); > > > + > > > + if ( !pdev ) > > > + return NULL; > > > + > > > + ASSERT(pdev->msix); > > > + > > > + if ( !pdev->msix->adj_access_table_idx[adj_type] ) > > > + { > > > + gdprintk(XENLOG_WARNING, > > > + "Page for adjacent MSI-X table access not initialized for %pp\n", > > > + pdev); > > > + > > > + return NULL; > > > + } > > > + > > > + /* If PBA lives on the same page too, forbid writes. */ > > > + if ( write && ( > > > + (adj_type == ADJ_IDX_LAST && > > > + pdev->msix->table.last == pdev->msix->pba.first) || > > > + (adj_type == ADJ_IDX_FIRST && > > > + pdev->msix->table.first == pdev->msix->pba.last)) ) > > > + { > > > + gdprintk(XENLOG_WARNING, > > > + "MSI-X table and PBA of %pp live on the same page, " > > > + "writing to other registers there is not implemented\n", > > > + pdev); > > > > Don't you actually need to check that the passed address falls into the > > PBA array? PBA can be sharing the same page as the MSI-X table, but > > that doesn't imply there aren't holes also not used by either the PBA > > or the MSI-X table in such page. > > I don't know exact location of PBA, so I'm rejecting writes just in case > they do hit PBA (see commit message). Hm, maybe we should cache the address and size of the PBA in msix_capability_init(). > > > + > > > +} > > > + > > > +static bool cf_check msixtbl_page_accept( > > > + const struct hvm_io_handler *handler, const ioreq_t *r) > > > +{ > > > + ASSERT(r->type == IOREQ_TYPE_COPY); > > > + > > > + return msixtbl_page_handler_get_hwaddr( > > > + current->domain, r->addr, r->dir == IOREQ_WRITE); > > > > I think you want to accept it also if it's a write to the PBA, and > > just drop it. You should always pass write=false and then drop it in > > msixtbl_page_write() if it falls in the PBA region (but still return > > X86EMUL_OKAY). > > I don't want to interfere with msixtbl_mmio_page_ops, this handler is > only about accesses not hitting actual MSI-X structures. But msixtbl_mmio_page_ops doesn't handle PBA array accesses at all, so you won't interfere by accepting PBA writes here and dropping them in msixtbl_page_write()? Maybe there's some piece I'm missing. > > > +} > > > + > > > +static int cf_check msixtbl_page_read( > > > + const struct hvm_io_handler *handler, > > > + uint64_t address, uint32_t len, uint64_t *pval) > > > > Why use hvm_io_ops and not hvm_mmio_ops? You only care about MMIO > > accesses here. > > I followed how msixtbl_mmio_ops are registered. Should that also be > changed to hvm_mmio_ops? Maybe, but let's leave that for another series I think. Using hvm_mmio_ops and register_mmio_handler() together with the slightly simplified handlers will allow you to drop some of the code. > > > + > > > + if ( address & (len - 1) ) > > > + return X86EMUL_UNHANDLEABLE; > > > > You can use IS_ALIGNED for clarity. In my fix for this for vPCI Jan > > asked to split unaligned accesses into 1byte ones, but I think for > > guests it's better to just drop them unless there's a specific case > > that we need to handle. > > > > Also you should return X86EMUL_OKAY here, as the address is handled > > here, but the current way to handle it is to drop the access. > > > > Printing a debug message can be helpful in case unaligned accesses > > need to be implemented in order to support some device. > > > > > + > > > + hwaddr = msixtbl_page_handler_get_hwaddr( > > > + current->domain, address, false); > > > + > > > + if ( !hwaddr ) > > > + return X86EMUL_UNHANDLEABLE; > > > > Maybe X86EMUL_RETRY, since the page was there in the accept handler. > > How does X86EMUL_RETRY work? Is it trying to find the handler again? Hm, I'm not sure it works as I thought it does, so maybe not a good suggestion after all. > > > @@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix) > > > WARN(); > > > msix->table.first = 0; > > > msix->table.last = 0; > > > + if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] ) > > > + { > > > + msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]); > > > + msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0; > > > + } > > > + if ( msix->adj_access_table_idx[ADJ_IDX_LAST] ) > > > + { > > > + msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]); > > > + msix->adj_access_table_idx[ADJ_IDX_LAST] = 0; > > > > Isn't 0 a valid idx? You should check for >= 0 and also set > > to -1 once the fixmap entry has been put. > > I rely here on msix using specific range of fixmap indexes > (FIX_MSIX_TO_RESERV_BASE - FIX_MSIX_TO_RESERV_END), which isn't starting > at 0. For this reason also, I keep unused entries at 0 (no need explicit > initialization). Hm, OK, then you should also check that the return of msix_get_fixmap() is strictly > 0, as right now it's using >= (and thus would think 0 is a valid fixmap entry). Thanks, Roger.
On 28.03.2023 14:52, Marek Marczykowski-Górecki wrote: > On Tue, Mar 28, 2023 at 02:34:23PM +0200, Jan Beulich wrote: >> On 28.03.2023 14:05, Marek Marczykowski-Górecki wrote: >>> On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote: >>>> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote: >>>>> +static bool cf_check msixtbl_page_accept( >>>>> + const struct hvm_io_handler *handler, const ioreq_t *r) >>>>> +{ >>>>> + ASSERT(r->type == IOREQ_TYPE_COPY); >>>>> + >>>>> + return msixtbl_page_handler_get_hwaddr( >>>>> + current->domain, r->addr, r->dir == IOREQ_WRITE); >>>> >>>> I think you want to accept it also if it's a write to the PBA, and >>>> just drop it. You should always pass write=false and then drop it in >>>> msixtbl_page_write() if it falls in the PBA region (but still return >>>> X86EMUL_OKAY). >>> >>> I don't want to interfere with msixtbl_mmio_page_ops, this handler is >>> only about accesses not hitting actual MSI-X structures. >> >> In his functionally similar vPCI change I did ask Roger to handle the >> "extra" space right from the same handlers. Maybe that's going to be >> best here, too. > > I have considered this option, but msixtbl_range() is already quite > complex, adding yet another case there won't make it easier to follow. Do you care about the case of msixtbl_addr_to_desc() returning NULL at all for the purpose you have? Like in Roger's patch I'd assume msixtbl_find_entry() needs extending what ranges it accepts; if need be another parameter may be added to cover cases where the extended coverage isn't wanted. > I mean, technically I can probably merge those two handlers together, > but I don't think it will result in nicer code. Especially since the > general direction is to abandon split of MSI-X table access handling > between Xen and QEMU and go with just QEMU doing it, hopefully at some > point not needing msixtbl_mmio_ops anymore (but still needing the one > for adjacent accesses). Hmm, at this point I'm not convinced of this plan. Instead I was hoping that once vPCI properly supports PVH DomU-s, we may also be able to make use of it for HVM, delegating less to qemu rather than more. Jan
On Tue, Mar 28, 2023 at 03:03:17PM +0200, Jan Beulich wrote: > On 28.03.2023 14:52, Marek Marczykowski-Górecki wrote: > > On Tue, Mar 28, 2023 at 02:34:23PM +0200, Jan Beulich wrote: > >> On 28.03.2023 14:05, Marek Marczykowski-Górecki wrote: > >>> On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote: > >>>> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote: > >>>>> +static bool cf_check msixtbl_page_accept( > >>>>> + const struct hvm_io_handler *handler, const ioreq_t *r) > >>>>> +{ > >>>>> + ASSERT(r->type == IOREQ_TYPE_COPY); > >>>>> + > >>>>> + return msixtbl_page_handler_get_hwaddr( > >>>>> + current->domain, r->addr, r->dir == IOREQ_WRITE); > >>>> > >>>> I think you want to accept it also if it's a write to the PBA, and > >>>> just drop it. You should always pass write=false and then drop it in > >>>> msixtbl_page_write() if it falls in the PBA region (but still return > >>>> X86EMUL_OKAY). > >>> > >>> I don't want to interfere with msixtbl_mmio_page_ops, this handler is > >>> only about accesses not hitting actual MSI-X structures. > >> > >> In his functionally similar vPCI change I did ask Roger to handle the > >> "extra" space right from the same handlers. Maybe that's going to be > >> best here, too. > > > > I have considered this option, but msixtbl_range() is already quite > > complex, adding yet another case there won't make it easier to follow. > > Do you care about the case of msixtbl_addr_to_desc() returning NULL at > all for the purpose you have? Like in Roger's patch I'd assume > msixtbl_find_entry() needs extending what ranges it accepts; if need > be another parameter may be added to cover cases where the extended > coverage isn't wanted. > > > I mean, technically I can probably merge those two handlers together, > > but I don't think it will result in nicer code. Especially since the > > general direction is to abandon split of MSI-X table access handling > > between Xen and QEMU and go with just QEMU doing it, hopefully at some > > point not needing msixtbl_mmio_ops anymore (but still needing the one > > for adjacent accesses). > > Hmm, at this point I'm not convinced of this plan. Instead I was hoping > that once vPCI properly supports PVH DomU-s, we may also be able to make > use of it for HVM, delegating less to qemu rather than more. Right, but vPCI doesn't use msixtbl_mmio_ops at all. I have to admit I don't like the split model we currently use for MSIX handling with QEMU. I find it difficult to understand, and prone to errors. IMO it would be much better if all the handling was either done in QEMU (except for the quirk we need here to handle adjacent accesses) or Xen by using vPCI. Hence I proposed to Marek that given writes to MSIX entries need to be propagated to QEMU in order for it to also track the mask bit, we could also move the handling of the mask bit to QEMU entirely, dropping the usage of msixtbl_mmio_ops. At some point I expect we should be able to choose between doing the handling in QEMU or vPCI in Xen. Thanks, Roger.
On Tue, Mar 28, 2023 at 03:03:17PM +0200, Jan Beulich wrote: > On 28.03.2023 14:52, Marek Marczykowski-Górecki wrote: > > On Tue, Mar 28, 2023 at 02:34:23PM +0200, Jan Beulich wrote: > >> On 28.03.2023 14:05, Marek Marczykowski-Górecki wrote: > >>> On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote: > >>>> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote: > >>>>> +static bool cf_check msixtbl_page_accept( > >>>>> + const struct hvm_io_handler *handler, const ioreq_t *r) > >>>>> +{ > >>>>> + ASSERT(r->type == IOREQ_TYPE_COPY); > >>>>> + > >>>>> + return msixtbl_page_handler_get_hwaddr( > >>>>> + current->domain, r->addr, r->dir == IOREQ_WRITE); > >>>> > >>>> I think you want to accept it also if it's a write to the PBA, and > >>>> just drop it. You should always pass write=false and then drop it in > >>>> msixtbl_page_write() if it falls in the PBA region (but still return > >>>> X86EMUL_OKAY). > >>> > >>> I don't want to interfere with msixtbl_mmio_page_ops, this handler is > >>> only about accesses not hitting actual MSI-X structures. > >> > >> In his functionally similar vPCI change I did ask Roger to handle the > >> "extra" space right from the same handlers. Maybe that's going to be > >> best here, too. > > > > I have considered this option, but msixtbl_range() is already quite > > complex, adding yet another case there won't make it easier to follow. > > Do you care about the case of msixtbl_addr_to_desc() returning NULL at > all for the purpose you have? IIUC I care specifically about this case. > Like in Roger's patch I'd assume > msixtbl_find_entry() needs extending what ranges it accepts; if need > be another parameter may be added to cover cases where the extended > coverage isn't wanted. > > > I mean, technically I can probably merge those two handlers together, > > but I don't think it will result in nicer code. Especially since the > > general direction is to abandon split of MSI-X table access handling > > between Xen and QEMU and go with just QEMU doing it, hopefully at some > > point not needing msixtbl_mmio_ops anymore (but still needing the one > > for adjacent accesses). > > Hmm, at this point I'm not convinced of this plan. Instead I was hoping > that once vPCI properly supports PVH DomU-s, we may also be able to make > use of it for HVM, delegating less to qemu rather than more. In that case, this code won't be needed anymore, which will also make this handler unnecessary. Anyway, I tried to merge this handling into existing handlers and the resulting patch is slightly bigger, so it doesn't seem to avoid any duplication. The only benefit I can think of is avoiding iterating msixtbl_list twice (for respective accept callbacks) on each access. Is it worth a bit more complicated handlers?
On 03.04.2023 06:21, Marek Marczykowski-Górecki wrote: > On Tue, Mar 28, 2023 at 03:03:17PM +0200, Jan Beulich wrote: >> On 28.03.2023 14:52, Marek Marczykowski-Górecki wrote: >>> I mean, technically I can probably merge those two handlers together, >>> but I don't think it will result in nicer code. Especially since the >>> general direction is to abandon split of MSI-X table access handling >>> between Xen and QEMU and go with just QEMU doing it, hopefully at some >>> point not needing msixtbl_mmio_ops anymore (but still needing the one >>> for adjacent accesses). >> >> Hmm, at this point I'm not convinced of this plan. Instead I was hoping >> that once vPCI properly supports PVH DomU-s, we may also be able to make >> use of it for HVM, delegating less to qemu rather than more. > > In that case, this code won't be needed anymore, which will also make > this handler unnecessary. > > Anyway, I tried to merge this handling into existing handlers and the > resulting patch is slightly bigger, so it doesn't seem to avoid any > duplication. The only benefit I can think of is avoiding iterating > msixtbl_list twice (for respective accept callbacks) on each access. Is > it worth a bit more complicated handlers? Well, limiting duplication (if possible) is only one aspect. Again referring to Roger's functionally similar vPCI work, an important (from my pov) aspect is to avoid consuming another handler slot (via a new call to hvm_next_io_handler()). Jan
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 9c82bf9b4ec2..9293009a4075 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = { .write = _msixtbl_write, }; +static void __iomem *msixtbl_page_handler_get_hwaddr( + const struct domain *d, + uint64_t address, + bool write) +{ + const struct pci_dev *pdev = NULL; + const struct msixtbl_entry *entry; + int adj_type; + + rcu_read_lock(&msixtbl_rcu_lock); + /* + * Check if it's on the same page as the end of the MSI-X table, but + * outside of the table itself. + */ + list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list ) + { + if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) && + address < entry->gtable ) + { + adj_type = ADJ_IDX_FIRST; + pdev = entry->pdev; + break; + } + if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) && + address >= entry->gtable + entry->table_len ) + { + adj_type = ADJ_IDX_LAST; + pdev = entry->pdev; + break; + } + } + rcu_read_unlock(&msixtbl_rcu_lock); + + if ( !pdev ) + return NULL; + + ASSERT(pdev->msix); + + if ( !pdev->msix->adj_access_table_idx[adj_type] ) + { + gdprintk(XENLOG_WARNING, + "Page for adjacent MSI-X table access not initialized for %pp\n", + pdev); + + return NULL; + } + + /* If PBA lives on the same page too, forbid writes. */ + if ( write && ( + (adj_type == ADJ_IDX_LAST && + pdev->msix->table.last == pdev->msix->pba.first) || + (adj_type == ADJ_IDX_FIRST && + pdev->msix->table.first == pdev->msix->pba.last)) ) + { + gdprintk(XENLOG_WARNING, + "MSI-X table and PBA of %pp live on the same page, " + "writing to other registers there is not implemented\n", + pdev); + return NULL; + } + + return fix_to_virt(pdev->msix->adj_access_table_idx[adj_type]) + + (address & (PAGE_SIZE - 1)); + +} + +static bool cf_check msixtbl_page_accept( + const struct hvm_io_handler *handler, const ioreq_t *r) +{ + ASSERT(r->type == IOREQ_TYPE_COPY); + + return msixtbl_page_handler_get_hwaddr( + current->domain, r->addr, r->dir == IOREQ_WRITE); +} + +static int cf_check msixtbl_page_read( + const struct hvm_io_handler *handler, + uint64_t address, uint32_t len, uint64_t *pval) +{ + void __iomem *hwaddr; + + if ( address & (len - 1) ) + return X86EMUL_UNHANDLEABLE; + + hwaddr = msixtbl_page_handler_get_hwaddr( + current->domain, address, false); + + if ( !hwaddr ) + return X86EMUL_UNHANDLEABLE; + + switch ( len ) + { + case 1: + *pval = readb(hwaddr); + break; + case 2: + *pval = readw(hwaddr); + break; + case 4: + *pval = readl(hwaddr); + break; + case 8: + *pval = readq(hwaddr); + break; + default: + return X86EMUL_UNHANDLEABLE; + } + return X86EMUL_OKAY; +} + +static int cf_check msixtbl_page_write( + const struct hvm_io_handler *handler, + uint64_t address, uint32_t len, uint64_t val) +{ + void __iomem *hwaddr = msixtbl_page_handler_get_hwaddr( + current->domain, address, true); + + if ( !hwaddr ) + return X86EMUL_UNHANDLEABLE; + + switch ( len ) { + case 1: + writeb(val, hwaddr); + break; + case 2: + writew(val, hwaddr); + break; + case 4: + writel(val, hwaddr); + break; + case 8: + writeq(val, hwaddr); + break; + default: + return X86EMUL_UNHANDLEABLE; + } + return X86EMUL_OKAY; + +} + +static const struct hvm_io_ops msixtbl_mmio_page_ops = { + .accept = msixtbl_page_accept, + .read = msixtbl_page_read, + .write = msixtbl_page_write, +}; + static void add_msixtbl_entry(struct domain *d, struct pci_dev *pdev, uint64_t gtable, @@ -593,6 +739,14 @@ void msixtbl_init(struct domain *d) handler->type = IOREQ_TYPE_COPY; handler->ops = &msixtbl_mmio_ops; } + + /* passthrough access to other registers on the same page */ + handler = hvm_next_io_handler(d); + if ( handler ) + { + handler->type = IOREQ_TYPE_COPY; + handler->ops = &msixtbl_mmio_page_ops; + } } void msixtbl_pt_cleanup(struct domain *d) diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h index a53ade95c9ad..d13cf1c1f873 100644 --- a/xen/arch/x86/include/asm/msi.h +++ b/xen/arch/x86/include/asm/msi.h @@ -207,6 +207,10 @@ struct msg_address { PCI_MSIX_ENTRY_SIZE + \ (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1))) +/* indexes in adj_access_table_idx[] below */ +#define ADJ_IDX_FIRST 0 +#define ADJ_IDX_LAST 1 + struct arch_msix { unsigned int nr_entries, used_entries; struct { @@ -214,6 +218,7 @@ struct arch_msix { } table, pba; int table_refcnt[MAX_MSIX_TABLE_PAGES]; int table_idx[MAX_MSIX_TABLE_PAGES]; + int adj_access_table_idx[2]; spinlock_t table_lock; bool host_maskall, guest_maskall; domid_t warned; diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c index d0bf63df1def..680853f84685 100644 --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev, domain_crash(d); /* XXX How to deal with existing mappings? */ } + + /* + * If the MSI-X table doesn't start at the page boundary, map the first page for + * passthrough accesses. + */ + if ( table_paddr & (PAGE_SIZE - 1) ) + { + int idx = msix_get_fixmap(msix, table_paddr, table_paddr); + + if ( idx >= 0 ) + msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx; + else + gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: %d\n", idx); + } + /* + * If the MSI-X table doesn't span full page(s), map the last page for + * passthrough accesses. + */ + if ( (table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) ) + { + uint64_t entry_paddr = table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE; + int idx = msix_get_fixmap(msix, table_paddr, entry_paddr); + + if ( idx >= 0 ) + msix->adj_access_table_idx[ADJ_IDX_LAST] = idx; + else + gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: %d\n", idx); + } } WARN_ON(msix->table.first != (table_paddr >> PAGE_SHIFT)); ++msix->used_entries; @@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix) WARN(); msix->table.first = 0; msix->table.last = 0; + if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] ) + { + msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]); + msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0; + } + if ( msix->adj_access_table_idx[ADJ_IDX_LAST] ) + { + msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]); + msix->adj_access_table_idx[ADJ_IDX_LAST] = 0; + } if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first, msix->pba.last) )
Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers on the same page as MSI-X table. Device model (especially one in stubdomain) cannot really handle those, as direct writes to that page is refused (page is on the mmio_ro_ranges list). Instead, add internal ioreq server that handle those writes. Doing this, requires correlating write location with guest view of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest, it requires msixtbl_entry->gtable, which is HVM-only. Similar feature for PV would need to be done separately. This can be also used to read Pending Bit Array, if it lives on the same page, making QEMU not needing /dev/mem access at all (especially helpful with lockdown enabled in dom0). If PBA lives on another page, QEMU will map it to the guest directly. If PBA lives on the same page, forbid writes and log a message. Technically, writes outside of PBA could be allowed, but at this moment the precise location of PBA isn't saved. Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> --- v2: - adjust commit message - pass struct domain to msixtbl_page_handler_get_hwaddr() - reduce local variables used only once - log a warning if write is forbidden if MSI-X and PBA lives on the same page - do not passthrough unaligned accesses - handle accesses both before and after MSI-X table --- xen/arch/x86/hvm/vmsi.c | 154 +++++++++++++++++++++++++++++++++ xen/arch/x86/include/asm/msi.h | 5 ++ xen/arch/x86/msi.c | 38 ++++++++ 3 files changed, 197 insertions(+)