Message ID | 20231124104136.3263722-4-Jiqian.Chen@amd.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Support device passthrough when dom0 is PVH on Xen | expand |
On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: > In PVH dom0, it uses the linux local interrupt mechanism, > when it allocs irq for a gsi, it is dynamic, and follow > the principle of applying first, distributing first. And > if you debug the kernel codes, you will find the irq > number is alloced from small to large, but the applying > gsi number is not, may gsi 38 comes before gsi 28, that > causes the irq number is not equal with the gsi number. > And when we passthrough a device, QEMU will use its gsi > number to do mapping actions, see xen_pt_realize-> > xc_physdev_map_pirq, but the gsi number is got from file > /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, > so it will fail when mapping. > And in current codes, there is no method to translate > irq to gsi for userspace. I think it would be cleaner to just introduce a new sysfs node that contains the gsi if a device is using one (much like the irq sysfs node)? Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so placing it in privcmd does seem quite strange to me. I understand that for passthrough we need the GSI, but such translation layer from IRQ to GSI is all Linux internal, and it would be much simpler to just expose the GSI in sysfs IMO. Thanks, Roger.
On 28.11.23 15:25, Roger Pau Monné wrote: > On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: >> In PVH dom0, it uses the linux local interrupt mechanism, >> when it allocs irq for a gsi, it is dynamic, and follow >> the principle of applying first, distributing first. And >> if you debug the kernel codes, you will find the irq >> number is alloced from small to large, but the applying >> gsi number is not, may gsi 38 comes before gsi 28, that >> causes the irq number is not equal with the gsi number. >> And when we passthrough a device, QEMU will use its gsi >> number to do mapping actions, see xen_pt_realize-> >> xc_physdev_map_pirq, but the gsi number is got from file >> /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, >> so it will fail when mapping. >> And in current codes, there is no method to translate >> irq to gsi for userspace. > > I think it would be cleaner to just introduce a new sysfs node that > contains the gsi if a device is using one (much like the irq sysfs > node)? > > Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so > placing it in privcmd does seem quite strange to me. I understand > that for passthrough we need the GSI, but such translation layer from > IRQ to GSI is all Linux internal, and it would be much simpler to just > expose the GSI in sysfs IMO. You are aware that we have a Xen specific variant of acpi_register_gsi()? It is the Xen event channel driver being responsible for the GSI<->IRQ mapping. Juergen
On Tue, Nov 28, 2023 at 03:42:31PM +0100, Juergen Gross wrote: > On 28.11.23 15:25, Roger Pau Monné wrote: > > On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: > > > In PVH dom0, it uses the linux local interrupt mechanism, > > > when it allocs irq for a gsi, it is dynamic, and follow > > > the principle of applying first, distributing first. And > > > if you debug the kernel codes, you will find the irq > > > number is alloced from small to large, but the applying > > > gsi number is not, may gsi 38 comes before gsi 28, that > > > causes the irq number is not equal with the gsi number. > > > And when we passthrough a device, QEMU will use its gsi > > > number to do mapping actions, see xen_pt_realize-> > > > xc_physdev_map_pirq, but the gsi number is got from file > > > /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, > > > so it will fail when mapping. > > > And in current codes, there is no method to translate > > > irq to gsi for userspace. > > > > I think it would be cleaner to just introduce a new sysfs node that > > contains the gsi if a device is using one (much like the irq sysfs > > node)? > > > > Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so > > placing it in privcmd does seem quite strange to me. I understand > > that for passthrough we need the GSI, but such translation layer from > > IRQ to GSI is all Linux internal, and it would be much simpler to just > > expose the GSI in sysfs IMO. > > You are aware that we have a Xen specific variant of acpi_register_gsi()? > > It is the Xen event channel driver being responsible for the GSI<->IRQ > mapping. I'm kind of lost, this translation function is specifically needed for PVH which doesn't use the Xen specific variant of acpi_register_gsi(), and hence the IRQ <-> GSI relation is whatever the Linux kernel does on native. I do understand that on a PV dom0 the proposed sysfs gsi node would match the irq node, but that doesn't seem like an issue to me. Note also that PVH doesn't use acpi_register_gsi_xen_hvm() because XENFEAT_hvm_pirqs feature is not exposed to PVH, so I expect it uses the x86 acpi_register_gsi_ioapic(). Thanks, Roger.
On 28.11.23 17:11, Roger Pau Monné wrote: > On Tue, Nov 28, 2023 at 03:42:31PM +0100, Juergen Gross wrote: >> On 28.11.23 15:25, Roger Pau Monné wrote: >>> On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: >>>> In PVH dom0, it uses the linux local interrupt mechanism, >>>> when it allocs irq for a gsi, it is dynamic, and follow >>>> the principle of applying first, distributing first. And >>>> if you debug the kernel codes, you will find the irq >>>> number is alloced from small to large, but the applying >>>> gsi number is not, may gsi 38 comes before gsi 28, that >>>> causes the irq number is not equal with the gsi number. >>>> And when we passthrough a device, QEMU will use its gsi >>>> number to do mapping actions, see xen_pt_realize-> >>>> xc_physdev_map_pirq, but the gsi number is got from file >>>> /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, >>>> so it will fail when mapping. >>>> And in current codes, there is no method to translate >>>> irq to gsi for userspace. >>> >>> I think it would be cleaner to just introduce a new sysfs node that >>> contains the gsi if a device is using one (much like the irq sysfs >>> node)? >>> >>> Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so >>> placing it in privcmd does seem quite strange to me. I understand >>> that for passthrough we need the GSI, but such translation layer from >>> IRQ to GSI is all Linux internal, and it would be much simpler to just >>> expose the GSI in sysfs IMO. >> >> You are aware that we have a Xen specific variant of acpi_register_gsi()? >> >> It is the Xen event channel driver being responsible for the GSI<->IRQ >> mapping. > > I'm kind of lost, this translation function is specifically needed for > PVH which doesn't use the Xen specific variant of acpi_register_gsi(), > and hence the IRQ <-> GSI relation is whatever the Linux kernel does > on native. > > I do understand that on a PV dom0 the proposed sysfs gsi node would > match the irq node, but that doesn't seem like an issue to me. > > Note also that PVH doesn't use acpi_register_gsi_xen_hvm() because > XENFEAT_hvm_pirqs feature is not exposed to PVH, so I expect it uses > the x86 acpi_register_gsi_ioapic(). Oh, I wasn't aware of this. Sorry for the noise. Juergen
On Tue, 28 Nov 2023, Roger Pau Monné wrote: > On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: > > In PVH dom0, it uses the linux local interrupt mechanism, > > when it allocs irq for a gsi, it is dynamic, and follow > > the principle of applying first, distributing first. And > > if you debug the kernel codes, you will find the irq > > number is alloced from small to large, but the applying > > gsi number is not, may gsi 38 comes before gsi 28, that > > causes the irq number is not equal with the gsi number. > > And when we passthrough a device, QEMU will use its gsi > > number to do mapping actions, see xen_pt_realize-> > > xc_physdev_map_pirq, but the gsi number is got from file > > /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, > > so it will fail when mapping. > > And in current codes, there is no method to translate > > irq to gsi for userspace. > > I think it would be cleaner to just introduce a new sysfs node that > contains the gsi if a device is using one (much like the irq sysfs > node)? > > Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so > placing it in privcmd does seem quite strange to me. I understand > that for passthrough we need the GSI, but such translation layer from > IRQ to GSI is all Linux internal, and it would be much simpler to just > expose the GSI in sysfs IMO. Maybe something to add to drivers/xen/sys-hypervisor.c in Linux. Juergen what do you think?
n Wed, 29 Nov 2023, Stefano Stabellini wrote: > On Tue, 28 Nov 2023, Roger Pau Monné wrote: > > On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: > > > In PVH dom0, it uses the linux local interrupt mechanism, > > > when it allocs irq for a gsi, it is dynamic, and follow > > > the principle of applying first, distributing first. And > > > if you debug the kernel codes, you will find the irq > > > number is alloced from small to large, but the applying > > > gsi number is not, may gsi 38 comes before gsi 28, that > > > causes the irq number is not equal with the gsi number. > > > And when we passthrough a device, QEMU will use its gsi > > > number to do mapping actions, see xen_pt_realize-> > > > xc_physdev_map_pirq, but the gsi number is got from file > > > /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, > > > so it will fail when mapping. > > > And in current codes, there is no method to translate > > > irq to gsi for userspace. > > > > I think it would be cleaner to just introduce a new sysfs node that > > contains the gsi if a device is using one (much like the irq sysfs > > node)? > > > > Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so > > placing it in privcmd does seem quite strange to me. I understand > > that for passthrough we need the GSI, but such translation layer from > > IRQ to GSI is all Linux internal, and it would be much simpler to just > > expose the GSI in sysfs IMO. > > Maybe something to add to drivers/xen/sys-hypervisor.c in Linux. > Juergen what do you think? Let me also add that privcmd.c is already a Linux specific interface. Although it was born to be a Xen hypercall "proxy" in reality today we have a few privcmd ioctls that don't translate into hypercalls. So I don't think that adding IOCTL_PRIVCMD_GSI_FROM_IRQ would be a problem.
On 2023/11/28 22:25, Roger Pau Monné wrote: > On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: >> In PVH dom0, it uses the linux local interrupt mechanism, >> when it allocs irq for a gsi, it is dynamic, and follow >> the principle of applying first, distributing first. And >> if you debug the kernel codes, you will find the irq >> number is alloced from small to large, but the applying >> gsi number is not, may gsi 38 comes before gsi 28, that >> causes the irq number is not equal with the gsi number. >> And when we passthrough a device, QEMU will use its gsi >> number to do mapping actions, see xen_pt_realize-> >> xc_physdev_map_pirq, but the gsi number is got from file >> /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, >> so it will fail when mapping. >> And in current codes, there is no method to translate >> irq to gsi for userspace. > > I think it would be cleaner to just introduce a new sysfs node that > contains the gsi if a device is using one (much like the irq sysfs > node)? Yes, I also ever thought this way. Add a sysfs node in /sys/bus/pci/devices/xxxx:xx:xx.x/gsi. But I am not sure sysfs or privcmd, which is better. If use sysfs node, should I need to use the macro of Xen to wrap the codes? And is it suitable to create it in function acpi_register_gsi_ioapic? > > Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so > placing it in privcmd does seem quite strange to me. I understand > that for passthrough we need the GSI, but such translation layer from > IRQ to GSI is all Linux internal, and it would be much simpler to just > expose the GSI in sysfs IMO. > > Thanks, Roger.
On 2023/11/29 00:11, Roger Pau Monné wrote: > On Tue, Nov 28, 2023 at 03:42:31PM +0100, Juergen Gross wrote: >> On 28.11.23 15:25, Roger Pau Monné wrote: >>> On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: >>>> In PVH dom0, it uses the linux local interrupt mechanism, >>>> when it allocs irq for a gsi, it is dynamic, and follow >>>> the principle of applying first, distributing first. And >>>> if you debug the kernel codes, you will find the irq >>>> number is alloced from small to large, but the applying >>>> gsi number is not, may gsi 38 comes before gsi 28, that >>>> causes the irq number is not equal with the gsi number. >>>> And when we passthrough a device, QEMU will use its gsi >>>> number to do mapping actions, see xen_pt_realize-> >>>> xc_physdev_map_pirq, but the gsi number is got from file >>>> /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, >>>> so it will fail when mapping. >>>> And in current codes, there is no method to translate >>>> irq to gsi for userspace. >>> >>> I think it would be cleaner to just introduce a new sysfs node that >>> contains the gsi if a device is using one (much like the irq sysfs >>> node)? >>> >>> Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so >>> placing it in privcmd does seem quite strange to me. I understand >>> that for passthrough we need the GSI, but such translation layer from >>> IRQ to GSI is all Linux internal, and it would be much simpler to just >>> expose the GSI in sysfs IMO. >> >> You are aware that we have a Xen specific variant of acpi_register_gsi()? >> >> It is the Xen event channel driver being responsible for the GSI<->IRQ >> mapping. > > I'm kind of lost, this translation function is specifically needed for > PVH which doesn't use the Xen specific variant of acpi_register_gsi(), > and hence the IRQ <-> GSI relation is whatever the Linux kernel does > on native. > > I do understand that on a PV dom0 the proposed sysfs gsi node would > match the irq node, but that doesn't seem like an issue to me. > > Note also that PVH doesn't use acpi_register_gsi_xen_hvm() because > XENFEAT_hvm_pirqs feature is not exposed to PVH, so I expect it uses > the x86 acpi_register_gsi_ioapic(). Yes, PVH use acpi_register_gsi_ioapic, thank Roger for explanation. > > Thanks, Roger.
On Wed, Nov 29, 2023 at 08:02:40PM -0800, Stefano Stabellini wrote: > n Wed, 29 Nov 2023, Stefano Stabellini wrote: > > On Tue, 28 Nov 2023, Roger Pau Monné wrote: > > > On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: > > > > In PVH dom0, it uses the linux local interrupt mechanism, > > > > when it allocs irq for a gsi, it is dynamic, and follow > > > > the principle of applying first, distributing first. And > > > > if you debug the kernel codes, you will find the irq > > > > number is alloced from small to large, but the applying > > > > gsi number is not, may gsi 38 comes before gsi 28, that > > > > causes the irq number is not equal with the gsi number. > > > > And when we passthrough a device, QEMU will use its gsi > > > > number to do mapping actions, see xen_pt_realize-> > > > > xc_physdev_map_pirq, but the gsi number is got from file > > > > /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, > > > > so it will fail when mapping. > > > > And in current codes, there is no method to translate > > > > irq to gsi for userspace. > > > > > > I think it would be cleaner to just introduce a new sysfs node that > > > contains the gsi if a device is using one (much like the irq sysfs > > > node)? > > > > > > Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so > > > placing it in privcmd does seem quite strange to me. I understand > > > that for passthrough we need the GSI, but such translation layer from > > > IRQ to GSI is all Linux internal, and it would be much simpler to just > > > expose the GSI in sysfs IMO. > > > > Maybe something to add to drivers/xen/sys-hypervisor.c in Linux. > > Juergen what do you think? > > Let me also add that privcmd.c is already a Linux specific interface. > Although it was born to be a Xen hypercall "proxy" in reality today we > have a few privcmd ioctls that don't translate into hypercalls. So I > don't think that adding IOCTL_PRIVCMD_GSI_FROM_IRQ would be a problem. Maybe not all ioctls translate to hypercalls (I guess you are referring to the IOCTL_PRIVCMD_RESTRICT ioctl), but they are specific Xen actions. Getting the GSI used by a device has nothing do to with Xen. IMO drivers/xen/sys-hypervisor.c is also not appropriate, but I'm not the maintainer of any of those components. There's nothing Xen specific about fetching the GSI associated with a PCI device. The fact that Xen needs it for passthrough is just a red herring, further cases where the GSI is needed might arise outside of Xen, and hence such node would better be placed in a generic location. The right location should be /sys/bus/pci/devices/<sbdf>/gsi. Thanks, Roger.
On Thu, 30 Nov 2023, Roger Pau Monné wrote: > On Wed, Nov 29, 2023 at 08:02:40PM -0800, Stefano Stabellini wrote: > > n Wed, 29 Nov 2023, Stefano Stabellini wrote: > > > On Tue, 28 Nov 2023, Roger Pau Monné wrote: > > > > On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: > > > > > In PVH dom0, it uses the linux local interrupt mechanism, > > > > > when it allocs irq for a gsi, it is dynamic, and follow > > > > > the principle of applying first, distributing first. And > > > > > if you debug the kernel codes, you will find the irq > > > > > number is alloced from small to large, but the applying > > > > > gsi number is not, may gsi 38 comes before gsi 28, that > > > > > causes the irq number is not equal with the gsi number. > > > > > And when we passthrough a device, QEMU will use its gsi > > > > > number to do mapping actions, see xen_pt_realize-> > > > > > xc_physdev_map_pirq, but the gsi number is got from file > > > > > /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, > > > > > so it will fail when mapping. > > > > > And in current codes, there is no method to translate > > > > > irq to gsi for userspace. > > > > > > > > I think it would be cleaner to just introduce a new sysfs node that > > > > contains the gsi if a device is using one (much like the irq sysfs > > > > node)? > > > > > > > > Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so > > > > placing it in privcmd does seem quite strange to me. I understand > > > > that for passthrough we need the GSI, but such translation layer from > > > > IRQ to GSI is all Linux internal, and it would be much simpler to just > > > > expose the GSI in sysfs IMO. > > > > > > Maybe something to add to drivers/xen/sys-hypervisor.c in Linux. > > > Juergen what do you think? > > > > Let me also add that privcmd.c is already a Linux specific interface. > > Although it was born to be a Xen hypercall "proxy" in reality today we > > have a few privcmd ioctls that don't translate into hypercalls. So I > > don't think that adding IOCTL_PRIVCMD_GSI_FROM_IRQ would be a problem. > > Maybe not all ioctls translate to hypercalls (I guess you are > referring to the IOCTL_PRIVCMD_RESTRICT ioctl), but they are specific > Xen actions. Getting the GSI used by a device has nothing do to with > Xen. > > IMO drivers/xen/sys-hypervisor.c is also not appropriate, but I'm not > the maintainer of any of those components. > > There's nothing Xen specific about fetching the GSI associated with a > PCI device. The fact that Xen needs it for passthrough is just a red > herring, further cases where the GSI is needed might arise outside of > Xen, and hence such node would better be placed in a generic > location. The right location should be /sys/bus/pci/devices/<sbdf>/gsi. That might be true but /sys/bus/pci/devices/<sbdf>/gsi is a non-Xen generic interface and the maintainers of that portion of Linux code might have a different opinion. We'll have to see.
On Thu, Nov 30, 2023 at 07:09:12PM -0800, Stefano Stabellini wrote: > On Thu, 30 Nov 2023, Roger Pau Monné wrote: > > On Wed, Nov 29, 2023 at 08:02:40PM -0800, Stefano Stabellini wrote: > > > n Wed, 29 Nov 2023, Stefano Stabellini wrote: > > > > On Tue, 28 Nov 2023, Roger Pau Monné wrote: > > > > > On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: > > > > > > In PVH dom0, it uses the linux local interrupt mechanism, > > > > > > when it allocs irq for a gsi, it is dynamic, and follow > > > > > > the principle of applying first, distributing first. And > > > > > > if you debug the kernel codes, you will find the irq > > > > > > number is alloced from small to large, but the applying > > > > > > gsi number is not, may gsi 38 comes before gsi 28, that > > > > > > causes the irq number is not equal with the gsi number. > > > > > > And when we passthrough a device, QEMU will use its gsi > > > > > > number to do mapping actions, see xen_pt_realize-> > > > > > > xc_physdev_map_pirq, but the gsi number is got from file > > > > > > /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, > > > > > > so it will fail when mapping. > > > > > > And in current codes, there is no method to translate > > > > > > irq to gsi for userspace. > > > > > > > > > > I think it would be cleaner to just introduce a new sysfs node that > > > > > contains the gsi if a device is using one (much like the irq sysfs > > > > > node)? > > > > > > > > > > Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so > > > > > placing it in privcmd does seem quite strange to me. I understand > > > > > that for passthrough we need the GSI, but such translation layer from > > > > > IRQ to GSI is all Linux internal, and it would be much simpler to just > > > > > expose the GSI in sysfs IMO. > > > > > > > > Maybe something to add to drivers/xen/sys-hypervisor.c in Linux. > > > > Juergen what do you think? > > > > > > Let me also add that privcmd.c is already a Linux specific interface. > > > Although it was born to be a Xen hypercall "proxy" in reality today we > > > have a few privcmd ioctls that don't translate into hypercalls. So I > > > don't think that adding IOCTL_PRIVCMD_GSI_FROM_IRQ would be a problem. > > > > Maybe not all ioctls translate to hypercalls (I guess you are > > referring to the IOCTL_PRIVCMD_RESTRICT ioctl), but they are specific > > Xen actions. Getting the GSI used by a device has nothing do to with > > Xen. > > > > IMO drivers/xen/sys-hypervisor.c is also not appropriate, but I'm not > > the maintainer of any of those components. > > > > There's nothing Xen specific about fetching the GSI associated with a > > PCI device. The fact that Xen needs it for passthrough is just a red > > herring, further cases where the GSI is needed might arise outside of > > Xen, and hence such node would better be placed in a generic > > location. The right location should be /sys/bus/pci/devices/<sbdf>/gsi. > > That might be true but /sys/bus/pci/devices/<sbdf>/gsi is a non-Xen > generic interface and the maintainers of that portion of Linux code > might have a different opinion. We'll have to see. Right, but before resorting to implement a Xen specific workaround let's attempt to do it the proper way :). I cannot see why exposing the gsi on sysfs like that would be an issue. There's a lot of resource information exposed on sysfs already, and it's a trivial node to implement. Thanks, Roger.
On 2023/12/1 17:03, Roger Pau Monné wrote: > On Thu, Nov 30, 2023 at 07:09:12PM -0800, Stefano Stabellini wrote: >> On Thu, 30 Nov 2023, Roger Pau Monné wrote: >>> On Wed, Nov 29, 2023 at 08:02:40PM -0800, Stefano Stabellini wrote: >>>> n Wed, 29 Nov 2023, Stefano Stabellini wrote: >>>>> On Tue, 28 Nov 2023, Roger Pau Monné wrote: >>>>>> On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: >>>>>>> In PVH dom0, it uses the linux local interrupt mechanism, >>>>>>> when it allocs irq for a gsi, it is dynamic, and follow >>>>>>> the principle of applying first, distributing first. And >>>>>>> if you debug the kernel codes, you will find the irq >>>>>>> number is alloced from small to large, but the applying >>>>>>> gsi number is not, may gsi 38 comes before gsi 28, that >>>>>>> causes the irq number is not equal with the gsi number. >>>>>>> And when we passthrough a device, QEMU will use its gsi >>>>>>> number to do mapping actions, see xen_pt_realize-> >>>>>>> xc_physdev_map_pirq, but the gsi number is got from file >>>>>>> /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, >>>>>>> so it will fail when mapping. >>>>>>> And in current codes, there is no method to translate >>>>>>> irq to gsi for userspace. >>>>>> >>>>>> I think it would be cleaner to just introduce a new sysfs node that >>>>>> contains the gsi if a device is using one (much like the irq sysfs >>>>>> node)? >>>>>> >>>>>> Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so >>>>>> placing it in privcmd does seem quite strange to me. I understand >>>>>> that for passthrough we need the GSI, but such translation layer from >>>>>> IRQ to GSI is all Linux internal, and it would be much simpler to just >>>>>> expose the GSI in sysfs IMO. >>>>> >>>>> Maybe something to add to drivers/xen/sys-hypervisor.c in Linux. >>>>> Juergen what do you think? >>>> >>>> Let me also add that privcmd.c is already a Linux specific interface. >>>> Although it was born to be a Xen hypercall "proxy" in reality today we >>>> have a few privcmd ioctls that don't translate into hypercalls. So I >>>> don't think that adding IOCTL_PRIVCMD_GSI_FROM_IRQ would be a problem. >>> >>> Maybe not all ioctls translate to hypercalls (I guess you are >>> referring to the IOCTL_PRIVCMD_RESTRICT ioctl), but they are specific >>> Xen actions. Getting the GSI used by a device has nothing do to with >>> Xen. >>> >>> IMO drivers/xen/sys-hypervisor.c is also not appropriate, but I'm not >>> the maintainer of any of those components. >>> >>> There's nothing Xen specific about fetching the GSI associated with a >>> PCI device. The fact that Xen needs it for passthrough is just a red >>> herring, further cases where the GSI is needed might arise outside of >>> Xen, and hence such node would better be placed in a generic >>> location. The right location should be /sys/bus/pci/devices/<sbdf>/gsi. >> >> That might be true but /sys/bus/pci/devices/<sbdf>/gsi is a non-Xen >> generic interface and the maintainers of that portion of Linux code >> might have a different opinion. We'll have to see. > > Right, but before resorting to implement a Xen specific workaround > let's attempt to do it the proper way :). > > I cannot see why exposing the gsi on sysfs like that would be an > issue. There's a lot of resource information exposed on sysfs > already, and it's a trivial node to implement. Thanks for both of you' s suggestions. At present, it seems the result of discussion is that it needs to add a gsi sysfs. I will modify it in the next version and then add the corresponding maintainer to the review list. > > Thanks, Roger.
On Mon, Dec 04, 2023 at 05:38:06AM +0000, Chen, Jiqian wrote: > On 2023/12/1 17:03, Roger Pau Monné wrote: > > On Thu, Nov 30, 2023 at 07:09:12PM -0800, Stefano Stabellini wrote: > >> On Thu, 30 Nov 2023, Roger Pau Monné wrote: > >>> On Wed, Nov 29, 2023 at 08:02:40PM -0800, Stefano Stabellini wrote: > >>>> n Wed, 29 Nov 2023, Stefano Stabellini wrote: > >>>>> On Tue, 28 Nov 2023, Roger Pau Monné wrote: > >>>>>> On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote: > >>>>>>> In PVH dom0, it uses the linux local interrupt mechanism, > >>>>>>> when it allocs irq for a gsi, it is dynamic, and follow > >>>>>>> the principle of applying first, distributing first. And > >>>>>>> if you debug the kernel codes, you will find the irq > >>>>>>> number is alloced from small to large, but the applying > >>>>>>> gsi number is not, may gsi 38 comes before gsi 28, that > >>>>>>> causes the irq number is not equal with the gsi number. > >>>>>>> And when we passthrough a device, QEMU will use its gsi > >>>>>>> number to do mapping actions, see xen_pt_realize-> > >>>>>>> xc_physdev_map_pirq, but the gsi number is got from file > >>>>>>> /sys/bus/pci/devices/xxxx:xx:xx.x/irq in current code, > >>>>>>> so it will fail when mapping. > >>>>>>> And in current codes, there is no method to translate > >>>>>>> irq to gsi for userspace. > >>>>>> > >>>>>> I think it would be cleaner to just introduce a new sysfs node that > >>>>>> contains the gsi if a device is using one (much like the irq sysfs > >>>>>> node)? > >>>>>> > >>>>>> Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so > >>>>>> placing it in privcmd does seem quite strange to me. I understand > >>>>>> that for passthrough we need the GSI, but such translation layer from > >>>>>> IRQ to GSI is all Linux internal, and it would be much simpler to just > >>>>>> expose the GSI in sysfs IMO. > >>>>> > >>>>> Maybe something to add to drivers/xen/sys-hypervisor.c in Linux. > >>>>> Juergen what do you think? > >>>> > >>>> Let me also add that privcmd.c is already a Linux specific interface. > >>>> Although it was born to be a Xen hypercall "proxy" in reality today we > >>>> have a few privcmd ioctls that don't translate into hypercalls. So I > >>>> don't think that adding IOCTL_PRIVCMD_GSI_FROM_IRQ would be a problem. > >>> > >>> Maybe not all ioctls translate to hypercalls (I guess you are > >>> referring to the IOCTL_PRIVCMD_RESTRICT ioctl), but they are specific > >>> Xen actions. Getting the GSI used by a device has nothing do to with > >>> Xen. > >>> > >>> IMO drivers/xen/sys-hypervisor.c is also not appropriate, but I'm not > >>> the maintainer of any of those components. > >>> > >>> There's nothing Xen specific about fetching the GSI associated with a > >>> PCI device. The fact that Xen needs it for passthrough is just a red > >>> herring, further cases where the GSI is needed might arise outside of > >>> Xen, and hence such node would better be placed in a generic > >>> location. The right location should be /sys/bus/pci/devices/<sbdf>/gsi. > >> > >> That might be true but /sys/bus/pci/devices/<sbdf>/gsi is a non-Xen > >> generic interface and the maintainers of that portion of Linux code > >> might have a different opinion. We'll have to see. > > > > Right, but before resorting to implement a Xen specific workaround > > let's attempt to do it the proper way :). > > > > I cannot see why exposing the gsi on sysfs like that would be an > > issue. There's a lot of resource information exposed on sysfs > > already, and it's a trivial node to implement. > Thanks for both of you' s suggestions. At present, it seems the result of discussion is that it needs to add a gsi sysfs. I will modify it in the next version and then add the corresponding maintainer to the review list. Thanks, please keep xen-devel on Cc if possible. Maybe if the suggested path is not suitable maintainers can recommend another path where the gsi (or equivalent) node could live. Thanks, Roger.
diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h index bc60e8fd55..ba4b8c3054 100644 --- a/tools/include/xen-sys/Linux/privcmd.h +++ b/tools/include/xen-sys/Linux/privcmd.h @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource { __u64 addr; } privcmd_mmap_resource_t; +typedef struct privcmd_gsi_from_irq { + __u32 irq; + __u32 gsi; +} privcmd_gsi_from_irq_t; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE \ _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t)) +#define IOCTL_PRIVCMD_GSI_FROM_IRQ \ + _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_irq_t)) #define IOCTL_PRIVCMD_UNIMPLEMENTED \ _IOC(_IOC_NONE, 'P', 0xFF, 0) diff --git a/tools/include/xencall.h b/tools/include/xencall.h index fc95ed0fe5..962cb45e1f 100644 --- a/tools/include/xencall.h +++ b/tools/include/xencall.h @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5); +int xen_oscall_gsi_from_irq(xencall_handle *xcall, int irq); + /* Variant(s) of the above, as needed, returning "long" instead of "int". */ long xencall2L(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2); diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 2ef8b4e054..2b9d55d2c6 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_physdev_gsi_from_irq(xc_interface *xch, int irq); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c index 02c4f8e1ae..6f79f3babd 100644 --- a/tools/libs/call/core.c +++ b/tools/libs/call/core.c @@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op, return osdep_hypercall(xcall, &call); } +int xen_oscall_gsi_from_irq(xencall_handle *xcall, int irq) +{ + return osdep_oscall(xcall, irq); +} + /* * Local variables: * mode: C diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map index d18a3174e9..6cde8eda05 100644 --- a/tools/libs/call/libxencall.map +++ b/tools/libs/call/libxencall.map @@ -10,6 +10,8 @@ VERS_1.0 { xencall4; xencall5; + xen_oscall_gsi_from_irq; + xencall_alloc_buffer; xencall_free_buffer; xencall_alloc_buffer_pages; diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c index 6d588e6bea..5267bceabf 100644 --- a/tools/libs/call/linux.c +++ b/tools/libs/call/linux.c @@ -85,6 +85,20 @@ long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall) return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall); } +long osdep_oscall(xencall_handle *xcall, int irq) +{ + privcmd_gsi_from_irq_t gsi_irq = { + .irq = irq, + .gsi = -1, + }; + + if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_IRQ, &gsi_irq)) { + return gsi_irq.irq; + } + + return gsi_irq.gsi; +} + static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages) { void *p; diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h index 9c3aa432ef..01a1f5076a 100644 --- a/tools/libs/call/private.h +++ b/tools/libs/call/private.h @@ -57,6 +57,15 @@ int osdep_xencall_close(xencall_handle *xcall); long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall); +#if defined(__linux__) +long osdep_oscall(xencall_handle *xcall, int irq); +#else +static inline long osdep_oscall(xencall_handle *xcall, int irq) +{ + return irq; +} +#endif + void *osdep_alloc_pages(xencall_handle *xcall, size_t nr_pages); void osdep_free_pages(xencall_handle *xcall, void *p, size_t nr_pages); diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c index 460a8e779c..4d3b138ebd 100644 --- a/tools/libs/ctrl/xc_physdev.c +++ b/tools/libs/ctrl/xc_physdev.c @@ -111,3 +111,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch, return rc; } +int xc_physdev_gsi_from_irq(xc_interface *xch, int irq) +{ + return xen_oscall_gsi_from_irq(xch->xcall, irq); +} diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index 96cb4da079..ba8803dab4 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, goto out_no_irq; } if ((fscanf(f, "%u", &irq) == 1) && irq) { + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq); if (r < 0) { LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",