diff mbox series

[RFC,XEN,6/6] tools/libs/light: pci: translate irq to gsi

Message ID 20230312075455.450187-7-ray.huang@amd.com (mailing list archive)
State New, archived
Headers show
Series Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 | expand

Commit Message

Huang Rui March 12, 2023, 7:54 a.m. UTC
From: Chen Jiqian <Jiqian.Chen@amd.com>

Use new xc_physdev_gsi_from_irq to get the GSI number

Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 tools/libs/light/libxl_pci.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jan Beulich March 14, 2023, 4:39 p.m. UTC | #1
On 12.03.2023 08:54, Huang Rui wrote:
> From: Chen Jiqian <Jiqian.Chen@amd.com>
> 
> Use new xc_physdev_gsi_from_irq to get the GSI number

Apart from again the "Why?", ...

> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>          goto out_no_irq;
>      }
>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
> +        irq = xc_physdev_gsi_from_irq(ctx->xch, irq);
>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>          if (r < 0) {
>              LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",

... aren't you breaking existing use cases this way?

Jan
Roger Pau Monne March 15, 2023, 4:35 p.m. UTC | #2
On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote:
> From: Chen Jiqian <Jiqian.Chen@amd.com>
> 
> Use new xc_physdev_gsi_from_irq to get the GSI number
> 
> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  tools/libs/light/libxl_pci.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> index f4c4f17545..47cf2799bf 100644
> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>          goto out_no_irq;
>      }
>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
> +        irq = xc_physdev_gsi_from_irq(ctx->xch, irq);

This is just a shot in the dark, because I don't really have enough
context to understand what's going on here, but see below.

I've taken a look at this on my box, and it seems like on
dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not
very consistent.

If devices are in use by a driver the irq sysfs node reports either
the GSI irq or the MSI IRQ (in case a single MSI interrupt is
setup).

It seems like pciback in Linux does something to report the correct
value:

root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
74
root@lcy2-dt107:~# xl pci-assignable-add 00:14.0
root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
16

As you can see, making the device assignable changed the value
reported by the irq node to be the GSI instead of the MSI IRQ, I would
think you are missing something similar in the PVH setup (some pciback
magic)?

Albeit I have no idea why you would need to translate from IRQ to GSI
in the way you do in this and related patches, because I'm missing the
context.

Regards, Roger.
Stefano Stabellini March 16, 2023, 12:44 a.m. UTC | #3
On Wed, 15 Mar 2023, Roger Pau Monné wrote:
> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote:
> > From: Chen Jiqian <Jiqian.Chen@amd.com>
> > 
> > Use new xc_physdev_gsi_from_irq to get the GSI number
> > 
> > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >  tools/libs/light/libxl_pci.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> > index f4c4f17545..47cf2799bf 100644
> > --- a/tools/libs/light/libxl_pci.c
> > +++ b/tools/libs/light/libxl_pci.c
> > @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
> >          goto out_no_irq;
> >      }
> >      if ((fscanf(f, "%u", &irq) == 1) && irq) {
> > +        irq = xc_physdev_gsi_from_irq(ctx->xch, irq);
> 
> This is just a shot in the dark, because I don't really have enough
> context to understand what's going on here, but see below.
> 
> I've taken a look at this on my box, and it seems like on
> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not
> very consistent.
> 
> If devices are in use by a driver the irq sysfs node reports either
> the GSI irq or the MSI IRQ (in case a single MSI interrupt is
> setup).
> 
> It seems like pciback in Linux does something to report the correct
> value:
> 
> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
> 74
> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0
> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
> 16
> 
> As you can see, making the device assignable changed the value
> reported by the irq node to be the GSI instead of the MSI IRQ, I would
> think you are missing something similar in the PVH setup (some pciback
> magic)?
> 
> Albeit I have no idea why you would need to translate from IRQ to GSI
> in the way you do in this and related patches, because I'm missing the
> context.

As I mention in another email, also keep in mind that we need QEMU to
work and QEMU calls:
1) xc_physdev_map_pirq (this is also called from libxl)
2) xc_domain_bind_pt_pci_irq


In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ
in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not
the IRQ. If you look at the implementation of xc_physdev_map_pirq,
you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen
xen/arch/x86/irq.c:allocate_and_map_gsi_pirq:

    if ( index < 0 || index >= nr_irqs_gsi )
    {
        dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id,
                index);
        return -EINVAL;
    }

nr_irqs_gsi < 112, and the check will fail.

So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need
to discover the GSI number corresponding to the IRQ number.
Roger Pau Monne March 16, 2023, 8:54 a.m. UTC | #4
On Wed, Mar 15, 2023 at 05:44:12PM -0700, Stefano Stabellini wrote:
> On Wed, 15 Mar 2023, Roger Pau Monné wrote:
> > On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote:
> > > From: Chen Jiqian <Jiqian.Chen@amd.com>
> > > 
> > > Use new xc_physdev_gsi_from_irq to get the GSI number
> > > 
> > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> > > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > > ---
> > >  tools/libs/light/libxl_pci.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> > > index f4c4f17545..47cf2799bf 100644
> > > --- a/tools/libs/light/libxl_pci.c
> > > +++ b/tools/libs/light/libxl_pci.c
> > > @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
> > >          goto out_no_irq;
> > >      }
> > >      if ((fscanf(f, "%u", &irq) == 1) && irq) {
> > > +        irq = xc_physdev_gsi_from_irq(ctx->xch, irq);
> > 
> > This is just a shot in the dark, because I don't really have enough
> > context to understand what's going on here, but see below.
> > 
> > I've taken a look at this on my box, and it seems like on
> > dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not
> > very consistent.
> > 
> > If devices are in use by a driver the irq sysfs node reports either
> > the GSI irq or the MSI IRQ (in case a single MSI interrupt is
> > setup).
> > 
> > It seems like pciback in Linux does something to report the correct
> > value:
> > 
> > root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
> > 74
> > root@lcy2-dt107:~# xl pci-assignable-add 00:14.0
> > root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
> > 16
> > 
> > As you can see, making the device assignable changed the value
> > reported by the irq node to be the GSI instead of the MSI IRQ, I would
> > think you are missing something similar in the PVH setup (some pciback
> > magic)?
> > 
> > Albeit I have no idea why you would need to translate from IRQ to GSI
> > in the way you do in this and related patches, because I'm missing the
> > context.
> 
> As I mention in another email, also keep in mind that we need QEMU to
> work and QEMU calls:
> 1) xc_physdev_map_pirq (this is also called from libxl)
> 2) xc_domain_bind_pt_pci_irq

Those would be fine, and don't need any translation since it's QEMU
the one that creates and maps the MSI(-X) interrupts, so it knows the
PIRQ without requiring any translation because it has been allocated
by QEMU itself.

GSI is kind of special because it's a fixed (legacy) interrupt mapped
to an IO-APIC pin and assigned to the device by the firmware.  The
setup in that case gets done by the toolstack (libxl) because the
mapping is immutable for the lifetime of the domain.

> In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ
> in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not
> the IRQ.

I think the real question here is why on this scenario IRQ != GSI for
GSI interrupts.  On one of my systems when booted as PVH dom0 with
pci=nomsi I get from /proc/interrupt:

  8:          0          0          0          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          1          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 16:          0          0       8373          0          0          0          0   IO-APIC  16-fasteoi   i801_smbus, xhci-hcd:usb1, ahci[0000:00:17.0]
 17:          0          0          0        542          0          0          0   IO-APIC  17-fasteoi   eth0
 24:       4112          0          0          0          0          0          0  xen-percpu    -virq      timer0
 25:        352          0          0          0          0          0          0  xen-percpu    -ipi       resched0
 26:       6635          0          0          0          0          0          0  xen-percpu    -ipi       callfunc0

So GSI == IRQ, and non GSI interrupts start past the last GSI, which
is 23 on this system because it has a single IO-APIC with 24 pins.

We need to figure out what causes GSIs to be mapped to IRQs != GSI on
your system, and then we can decide how to fix this.  I would expect
it could be fixed so that IRQ == GSI (like it's on PV dom0), and none
of this translation to be necessary.

Can you paste the output of /proc/interrupts on that system that has a
GSI not identity mapped to an IRQ?

> If you look at the implementation of xc_physdev_map_pirq,
> you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen
> xen/arch/x86/irq.c:allocate_and_map_gsi_pirq:
> 
>     if ( index < 0 || index >= nr_irqs_gsi )
>     {
>         dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id,
>                 index);
>         return -EINVAL;
>     }
> 
> nr_irqs_gsi < 112, and the check will fail.
> 
> So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need
> to discover the GSI number corresponding to the IRQ number.

Right, see above, I think the real problem is that IRQ != GSI on your
Linux dom0 for some reason.

Thanks, Roger.
Jan Beulich March 16, 2023, 8:55 a.m. UTC | #5
On 16.03.2023 01:44, Stefano Stabellini wrote:
> On Wed, 15 Mar 2023, Roger Pau Monné wrote:
>> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote:
>>> From: Chen Jiqian <Jiqian.Chen@amd.com>
>>>
>>> Use new xc_physdev_gsi_from_irq to get the GSI number
>>>
>>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> ---
>>>  tools/libs/light/libxl_pci.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>>> index f4c4f17545..47cf2799bf 100644
>>> --- a/tools/libs/light/libxl_pci.c
>>> +++ b/tools/libs/light/libxl_pci.c
>>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>          goto out_no_irq;
>>>      }
>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>> +        irq = xc_physdev_gsi_from_irq(ctx->xch, irq);
>>
>> This is just a shot in the dark, because I don't really have enough
>> context to understand what's going on here, but see below.
>>
>> I've taken a look at this on my box, and it seems like on
>> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not
>> very consistent.
>>
>> If devices are in use by a driver the irq sysfs node reports either
>> the GSI irq or the MSI IRQ (in case a single MSI interrupt is
>> setup).
>>
>> It seems like pciback in Linux does something to report the correct
>> value:
>>
>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
>> 74
>> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0
>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
>> 16
>>
>> As you can see, making the device assignable changed the value
>> reported by the irq node to be the GSI instead of the MSI IRQ, I would
>> think you are missing something similar in the PVH setup (some pciback
>> magic)?
>>
>> Albeit I have no idea why you would need to translate from IRQ to GSI
>> in the way you do in this and related patches, because I'm missing the
>> context.
> 
> As I mention in another email, also keep in mind that we need QEMU to
> work and QEMU calls:
> 1) xc_physdev_map_pirq (this is also called from libxl)
> 2) xc_domain_bind_pt_pci_irq
> 
> 
> In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ
> in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not
> the IRQ. If you look at the implementation of xc_physdev_map_pirq,
> you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen
> xen/arch/x86/irq.c:allocate_and_map_gsi_pirq:
> 
>     if ( index < 0 || index >= nr_irqs_gsi )
>     {
>         dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id,
>                 index);
>         return -EINVAL;
>     }
> 
> nr_irqs_gsi < 112, and the check will fail.
> 
> So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need
> to discover the GSI number corresponding to the IRQ number.

That's one possible approach. Another could be (making a lot of assumptions)
that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen
then translates that to GSI, knowing that PVH doesn't have (host) GSIs
exposed to it.

Jan
Roger Pau Monne March 16, 2023, 9:27 a.m. UTC | #6
On Thu, Mar 16, 2023 at 09:55:03AM +0100, Jan Beulich wrote:
> On 16.03.2023 01:44, Stefano Stabellini wrote:
> > On Wed, 15 Mar 2023, Roger Pau Monné wrote:
> >> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote:
> >>> From: Chen Jiqian <Jiqian.Chen@amd.com>
> >>>
> >>> Use new xc_physdev_gsi_from_irq to get the GSI number
> >>>
> >>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> >>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>> ---
> >>>  tools/libs/light/libxl_pci.c | 1 +
> >>>  1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> >>> index f4c4f17545..47cf2799bf 100644
> >>> --- a/tools/libs/light/libxl_pci.c
> >>> +++ b/tools/libs/light/libxl_pci.c
> >>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
> >>>          goto out_no_irq;
> >>>      }
> >>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
> >>> +        irq = xc_physdev_gsi_from_irq(ctx->xch, irq);
> >>
> >> This is just a shot in the dark, because I don't really have enough
> >> context to understand what's going on here, but see below.
> >>
> >> I've taken a look at this on my box, and it seems like on
> >> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not
> >> very consistent.
> >>
> >> If devices are in use by a driver the irq sysfs node reports either
> >> the GSI irq or the MSI IRQ (in case a single MSI interrupt is
> >> setup).
> >>
> >> It seems like pciback in Linux does something to report the correct
> >> value:
> >>
> >> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
> >> 74
> >> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0
> >> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
> >> 16
> >>
> >> As you can see, making the device assignable changed the value
> >> reported by the irq node to be the GSI instead of the MSI IRQ, I would
> >> think you are missing something similar in the PVH setup (some pciback
> >> magic)?
> >>
> >> Albeit I have no idea why you would need to translate from IRQ to GSI
> >> in the way you do in this and related patches, because I'm missing the
> >> context.
> > 
> > As I mention in another email, also keep in mind that we need QEMU to
> > work and QEMU calls:
> > 1) xc_physdev_map_pirq (this is also called from libxl)
> > 2) xc_domain_bind_pt_pci_irq
> > 
> > 
> > In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ
> > in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not
> > the IRQ. If you look at the implementation of xc_physdev_map_pirq,
> > you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen
> > xen/arch/x86/irq.c:allocate_and_map_gsi_pirq:
> > 
> >     if ( index < 0 || index >= nr_irqs_gsi )
> >     {
> >         dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id,
> >                 index);
> >         return -EINVAL;
> >     }
> > 
> > nr_irqs_gsi < 112, and the check will fail.
> > 
> > So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need
> > to discover the GSI number corresponding to the IRQ number.
> 
> That's one possible approach. Another could be (making a lot of assumptions)
> that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen
> then translates that to GSI, knowing that PVH doesn't have (host) GSIs
> exposed to it.

I don't think Xen can translate a Linux IRQ to a GSI, as that's a
Linux abstraction Xen has no part in.

The GSIs exposed to a PVH dom0 are the native (host) ones, as we
create an emulated IO-APIC topology that mimics the physical one.

Question here is why Linux ends up with a IRQ != GSI, as it's my
understanding on Linux GSIs will always be identity mapped to IRQs, and
the IRQ space up to the last possible GSI is explicitly reserved for
this purpose.

Thanks, Roger.
Jan Beulich March 16, 2023, 9:42 a.m. UTC | #7
On 16.03.2023 10:27, Roger Pau Monné wrote:
> On Thu, Mar 16, 2023 at 09:55:03AM +0100, Jan Beulich wrote:
>> On 16.03.2023 01:44, Stefano Stabellini wrote:
>>> On Wed, 15 Mar 2023, Roger Pau Monné wrote:
>>>> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote:
>>>>> From: Chen Jiqian <Jiqian.Chen@amd.com>
>>>>>
>>>>> Use new xc_physdev_gsi_from_irq to get the GSI number
>>>>>
>>>>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
>>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>>>> ---
>>>>>  tools/libs/light/libxl_pci.c | 1 +
>>>>>  1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>>>>> index f4c4f17545..47cf2799bf 100644
>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>          goto out_no_irq;
>>>>>      }
>>>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>>>> +        irq = xc_physdev_gsi_from_irq(ctx->xch, irq);
>>>>
>>>> This is just a shot in the dark, because I don't really have enough
>>>> context to understand what's going on here, but see below.
>>>>
>>>> I've taken a look at this on my box, and it seems like on
>>>> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not
>>>> very consistent.
>>>>
>>>> If devices are in use by a driver the irq sysfs node reports either
>>>> the GSI irq or the MSI IRQ (in case a single MSI interrupt is
>>>> setup).
>>>>
>>>> It seems like pciback in Linux does something to report the correct
>>>> value:
>>>>
>>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
>>>> 74
>>>> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0
>>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
>>>> 16
>>>>
>>>> As you can see, making the device assignable changed the value
>>>> reported by the irq node to be the GSI instead of the MSI IRQ, I would
>>>> think you are missing something similar in the PVH setup (some pciback
>>>> magic)?
>>>>
>>>> Albeit I have no idea why you would need to translate from IRQ to GSI
>>>> in the way you do in this and related patches, because I'm missing the
>>>> context.
>>>
>>> As I mention in another email, also keep in mind that we need QEMU to
>>> work and QEMU calls:
>>> 1) xc_physdev_map_pirq (this is also called from libxl)
>>> 2) xc_domain_bind_pt_pci_irq
>>>
>>>
>>> In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ
>>> in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not
>>> the IRQ. If you look at the implementation of xc_physdev_map_pirq,
>>> you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen
>>> xen/arch/x86/irq.c:allocate_and_map_gsi_pirq:
>>>
>>>     if ( index < 0 || index >= nr_irqs_gsi )
>>>     {
>>>         dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id,
>>>                 index);
>>>         return -EINVAL;
>>>     }
>>>
>>> nr_irqs_gsi < 112, and the check will fail.
>>>
>>> So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need
>>> to discover the GSI number corresponding to the IRQ number.
>>
>> That's one possible approach. Another could be (making a lot of assumptions)
>> that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen
>> then translates that to GSI, knowing that PVH doesn't have (host) GSIs
>> exposed to it.
> 
> I don't think Xen can translate a Linux IRQ to a GSI, as that's a
> Linux abstraction Xen has no part in.

Well, I was talking about whatever Dom0 and Xen use to communicate. I.e.
if at all I might have meant pIRQ, but now that you mention ...

> The GSIs exposed to a PVH dom0 are the native (host) ones, as we
> create an emulated IO-APIC topology that mimics the physical one.
> 
> Question here is why Linux ends up with a IRQ != GSI, as it's my
> understanding on Linux GSIs will always be identity mapped to IRQs, and
> the IRQ space up to the last possible GSI is explicitly reserved for
> this purpose.

... this I guess pIRQ was a PV-only concept, and it really ought to be
GSI in the PVH case. So yes, it then all boils down to that Linux-
internal question.

Jan
Stefano Stabellini March 16, 2023, 11:19 p.m. UTC | #8
On Thu, 16 Mar 2023, Jan Beulich wrote:
> On 16.03.2023 10:27, Roger Pau Monné wrote:
> > On Thu, Mar 16, 2023 at 09:55:03AM +0100, Jan Beulich wrote:
> >> On 16.03.2023 01:44, Stefano Stabellini wrote:
> >>> On Wed, 15 Mar 2023, Roger Pau Monné wrote:
> >>>> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote:
> >>>>> From: Chen Jiqian <Jiqian.Chen@amd.com>
> >>>>>
> >>>>> Use new xc_physdev_gsi_from_irq to get the GSI number
> >>>>>
> >>>>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
> >>>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>>>> ---
> >>>>>  tools/libs/light/libxl_pci.c | 1 +
> >>>>>  1 file changed, 1 insertion(+)
> >>>>>
> >>>>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> >>>>> index f4c4f17545..47cf2799bf 100644
> >>>>> --- a/tools/libs/light/libxl_pci.c
> >>>>> +++ b/tools/libs/light/libxl_pci.c
> >>>>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc,
> >>>>>          goto out_no_irq;
> >>>>>      }
> >>>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
> >>>>> +        irq = xc_physdev_gsi_from_irq(ctx->xch, irq);
> >>>>
> >>>> This is just a shot in the dark, because I don't really have enough
> >>>> context to understand what's going on here, but see below.
> >>>>
> >>>> I've taken a look at this on my box, and it seems like on
> >>>> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not
> >>>> very consistent.
> >>>>
> >>>> If devices are in use by a driver the irq sysfs node reports either
> >>>> the GSI irq or the MSI IRQ (in case a single MSI interrupt is
> >>>> setup).
> >>>>
> >>>> It seems like pciback in Linux does something to report the correct
> >>>> value:
> >>>>
> >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
> >>>> 74
> >>>> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0
> >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq
> >>>> 16
> >>>>
> >>>> As you can see, making the device assignable changed the value
> >>>> reported by the irq node to be the GSI instead of the MSI IRQ, I would
> >>>> think you are missing something similar in the PVH setup (some pciback
> >>>> magic)?
> >>>>
> >>>> Albeit I have no idea why you would need to translate from IRQ to GSI
> >>>> in the way you do in this and related patches, because I'm missing the
> >>>> context.
> >>>
> >>> As I mention in another email, also keep in mind that we need QEMU to
> >>> work and QEMU calls:
> >>> 1) xc_physdev_map_pirq (this is also called from libxl)
> >>> 2) xc_domain_bind_pt_pci_irq
> >>>
> >>>
> >>> In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ
> >>> in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not
> >>> the IRQ. If you look at the implementation of xc_physdev_map_pirq,
> >>> you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen
> >>> xen/arch/x86/irq.c:allocate_and_map_gsi_pirq:
> >>>
> >>>     if ( index < 0 || index >= nr_irqs_gsi )
> >>>     {
> >>>         dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id,
> >>>                 index);
> >>>         return -EINVAL;
> >>>     }
> >>>
> >>> nr_irqs_gsi < 112, and the check will fail.
> >>>
> >>> So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need
> >>> to discover the GSI number corresponding to the IRQ number.
> >>
> >> That's one possible approach. Another could be (making a lot of assumptions)
> >> that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen
> >> then translates that to GSI, knowing that PVH doesn't have (host) GSIs
> >> exposed to it.
> > 
> > I don't think Xen can translate a Linux IRQ to a GSI, as that's a
> > Linux abstraction Xen has no part in.
> 
> Well, I was talking about whatever Dom0 and Xen use to communicate. I.e.
> if at all I might have meant pIRQ, but now that you mention ...
> 
> > The GSIs exposed to a PVH dom0 are the native (host) ones, as we
> > create an emulated IO-APIC topology that mimics the physical one.
> > 
> > Question here is why Linux ends up with a IRQ != GSI, as it's my
> > understanding on Linux GSIs will always be identity mapped to IRQs, and
> > the IRQ space up to the last possible GSI is explicitly reserved for
> > this purpose.
> 
> ... this I guess pIRQ was a PV-only concept, and it really ought to be
> GSI in the PVH case. So yes, it then all boils down to that Linux-
> internal question.

Excellent question but we'll have to wait for Ray as he is the one with
access to the hardware. But I have this data I can share in the
meantime:

[    1.260378] IRQ to pin mappings:
[    1.260387] IRQ1 -> 0:1
[    1.260395] IRQ2 -> 0:2
[    1.260403] IRQ3 -> 0:3
[    1.260410] IRQ4 -> 0:4
[    1.260418] IRQ5 -> 0:5
[    1.260425] IRQ6 -> 0:6
[    1.260432] IRQ7 -> 0:7
[    1.260440] IRQ8 -> 0:8
[    1.260447] IRQ9 -> 0:9
[    1.260455] IRQ10 -> 0:10
[    1.260462] IRQ11 -> 0:11
[    1.260470] IRQ12 -> 0:12
[    1.260478] IRQ13 -> 0:13
[    1.260485] IRQ14 -> 0:14
[    1.260493] IRQ15 -> 0:15
[    1.260505] IRQ106 -> 1:8
[    1.260513] IRQ112 -> 1:4
[    1.260521] IRQ116 -> 1:13
[    1.260529] IRQ117 -> 1:14
[    1.260537] IRQ118 -> 1:15
[    1.260544] .................................... done.


And I think Ray traced the point in Linux where Linux gives us an IRQ ==
112 (which is the one causing issues):

__acpi_register_gsi->
        acpi_register_gsi_ioapic->
                mp_map_gsi_to_irq->
                        mp_map_pin_to_irq->
                                __irq_resolve_mapping()

        if (likely(data)) {
                desc = irq_data_to_desc(data);
                if (irq)
                        *irq = data->irq;
                /* this IRQ is 112, IO-APIC-34 domain */
        }
Jan Beulich March 17, 2023, 8:39 a.m. UTC | #9
On 17.03.2023 00:19, Stefano Stabellini wrote:
> On Thu, 16 Mar 2023, Jan Beulich wrote:
>> So yes, it then all boils down to that Linux-
>> internal question.
> 
> Excellent question but we'll have to wait for Ray as he is the one with
> access to the hardware. But I have this data I can share in the
> meantime:
> 
> [    1.260378] IRQ to pin mappings:
> [    1.260387] IRQ1 -> 0:1
> [    1.260395] IRQ2 -> 0:2
> [    1.260403] IRQ3 -> 0:3
> [    1.260410] IRQ4 -> 0:4
> [    1.260418] IRQ5 -> 0:5
> [    1.260425] IRQ6 -> 0:6
> [    1.260432] IRQ7 -> 0:7
> [    1.260440] IRQ8 -> 0:8
> [    1.260447] IRQ9 -> 0:9
> [    1.260455] IRQ10 -> 0:10
> [    1.260462] IRQ11 -> 0:11
> [    1.260470] IRQ12 -> 0:12
> [    1.260478] IRQ13 -> 0:13
> [    1.260485] IRQ14 -> 0:14
> [    1.260493] IRQ15 -> 0:15
> [    1.260505] IRQ106 -> 1:8
> [    1.260513] IRQ112 -> 1:4
> [    1.260521] IRQ116 -> 1:13
> [    1.260529] IRQ117 -> 1:14
> [    1.260537] IRQ118 -> 1:15
> [    1.260544] .................................... done.

And what does Linux think are IRQs 16 ... 105? Have you compared with
Linux running baremetal on the same hardware?

Jan

> And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> 112 (which is the one causing issues):
> 
> __acpi_register_gsi->
>         acpi_register_gsi_ioapic->
>                 mp_map_gsi_to_irq->
>                         mp_map_pin_to_irq->
>                                 __irq_resolve_mapping()
> 
>         if (likely(data)) {
>                 desc = irq_data_to_desc(data);
>                 if (irq)
>                         *irq = data->irq;
>                 /* this IRQ is 112, IO-APIC-34 domain */
>         }
Roger Pau Monne March 17, 2023, 9:51 a.m. UTC | #10
On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> On 17.03.2023 00:19, Stefano Stabellini wrote:
> > On Thu, 16 Mar 2023, Jan Beulich wrote:
> >> So yes, it then all boils down to that Linux-
> >> internal question.
> > 
> > Excellent question but we'll have to wait for Ray as he is the one with
> > access to the hardware. But I have this data I can share in the
> > meantime:
> > 
> > [    1.260378] IRQ to pin mappings:
> > [    1.260387] IRQ1 -> 0:1
> > [    1.260395] IRQ2 -> 0:2
> > [    1.260403] IRQ3 -> 0:3
> > [    1.260410] IRQ4 -> 0:4
> > [    1.260418] IRQ5 -> 0:5
> > [    1.260425] IRQ6 -> 0:6
> > [    1.260432] IRQ7 -> 0:7
> > [    1.260440] IRQ8 -> 0:8
> > [    1.260447] IRQ9 -> 0:9
> > [    1.260455] IRQ10 -> 0:10
> > [    1.260462] IRQ11 -> 0:11
> > [    1.260470] IRQ12 -> 0:12
> > [    1.260478] IRQ13 -> 0:13
> > [    1.260485] IRQ14 -> 0:14
> > [    1.260493] IRQ15 -> 0:15
> > [    1.260505] IRQ106 -> 1:8
> > [    1.260513] IRQ112 -> 1:4
> > [    1.260521] IRQ116 -> 1:13
> > [    1.260529] IRQ117 -> 1:14
> > [    1.260537] IRQ118 -> 1:15
> > [    1.260544] .................................... done.
> 
> And what does Linux think are IRQs 16 ... 105? Have you compared with
> Linux running baremetal on the same hardware?

So I have some emails from Ray from he time he was looking into this,
and on Linux dom0 PVH dmesg there is:

[    0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23
[    0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55

So it seems the vIO-APIC data provided by Xen to dom0 is at least
consistent.
 
> > And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> > 112 (which is the one causing issues):
> > 
> > __acpi_register_gsi->
> >         acpi_register_gsi_ioapic->
> >                 mp_map_gsi_to_irq->
> >                         mp_map_pin_to_irq->
> >                                 __irq_resolve_mapping()
> > 
> >         if (likely(data)) {
> >                 desc = irq_data_to_desc(data);
> >                 if (irq)
> >                         *irq = data->irq;
> >                 /* this IRQ is 112, IO-APIC-34 domain */
> >         }


Could this all be a result of patch 4/5 in the Linux series ("[RFC
PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
__acpi_register_gsi hook is installed for PVH in order to setup GSIs
using PHYSDEV ops instead of doing it natively from the IO-APIC?

FWIW, the introduced function in that patch
(acpi_register_gsi_xen_pvh()) seems to unconditionally call
acpi_register_gsi_ioapic() without checking if the GSI is already
registered, which might lead to multiple IRQs being allocated for the
same underlying GSI?

As I commented there, I think that approach is wrong.  If the GSI has
not been mapped in Xen (because dom0 hasn't unmasked the respective
IO-APIC pin) we should add some logic in the toolstack to map it
before attempting to bind.

Thanks, Roger.
Stefano Stabellini March 17, 2023, 6:15 p.m. UTC | #11
On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> > On 17.03.2023 00:19, Stefano Stabellini wrote:
> > > On Thu, 16 Mar 2023, Jan Beulich wrote:
> > >> So yes, it then all boils down to that Linux-
> > >> internal question.
> > > 
> > > Excellent question but we'll have to wait for Ray as he is the one with
> > > access to the hardware. But I have this data I can share in the
> > > meantime:
> > > 
> > > [    1.260378] IRQ to pin mappings:
> > > [    1.260387] IRQ1 -> 0:1
> > > [    1.260395] IRQ2 -> 0:2
> > > [    1.260403] IRQ3 -> 0:3
> > > [    1.260410] IRQ4 -> 0:4
> > > [    1.260418] IRQ5 -> 0:5
> > > [    1.260425] IRQ6 -> 0:6
> > > [    1.260432] IRQ7 -> 0:7
> > > [    1.260440] IRQ8 -> 0:8
> > > [    1.260447] IRQ9 -> 0:9
> > > [    1.260455] IRQ10 -> 0:10
> > > [    1.260462] IRQ11 -> 0:11
> > > [    1.260470] IRQ12 -> 0:12
> > > [    1.260478] IRQ13 -> 0:13
> > > [    1.260485] IRQ14 -> 0:14
> > > [    1.260493] IRQ15 -> 0:15
> > > [    1.260505] IRQ106 -> 1:8
> > > [    1.260513] IRQ112 -> 1:4
> > > [    1.260521] IRQ116 -> 1:13
> > > [    1.260529] IRQ117 -> 1:14
> > > [    1.260537] IRQ118 -> 1:15
> > > [    1.260544] .................................... done.
> > 
> > And what does Linux think are IRQs 16 ... 105? Have you compared with
> > Linux running baremetal on the same hardware?
> 
> So I have some emails from Ray from he time he was looking into this,
> and on Linux dom0 PVH dmesg there is:
> 
> [    0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23
> [    0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55
> 
> So it seems the vIO-APIC data provided by Xen to dom0 is at least
> consistent.
>  
> > > And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> > > 112 (which is the one causing issues):
> > > 
> > > __acpi_register_gsi->
> > >         acpi_register_gsi_ioapic->
> > >                 mp_map_gsi_to_irq->
> > >                         mp_map_pin_to_irq->
> > >                                 __irq_resolve_mapping()
> > > 
> > >         if (likely(data)) {
> > >                 desc = irq_data_to_desc(data);
> > >                 if (irq)
> > >                         *irq = data->irq;
> > >                 /* this IRQ is 112, IO-APIC-34 domain */
> > >         }
> 
> 
> Could this all be a result of patch 4/5 in the Linux series ("[RFC
> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> using PHYSDEV ops instead of doing it natively from the IO-APIC?
> 
> FWIW, the introduced function in that patch
> (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> acpi_register_gsi_ioapic() without checking if the GSI is already
> registered, which might lead to multiple IRQs being allocated for the
> same underlying GSI?

I understand this point and I think it needs investigating.


> As I commented there, I think that approach is wrong.  If the GSI has
> not been mapped in Xen (because dom0 hasn't unmasked the respective
> IO-APIC pin) we should add some logic in the toolstack to map it
> before attempting to bind.

But this statement confuses me. The toolstack doesn't get involved in
IRQ setup for PCI devices for HVM guests? Keep in mind that this is a
regular HVM guest creation on PVH Dom0, so normally the IRQ setup is
done by QEMU, and QEMU already calls xc_physdev_map_pirq and
xc_domain_bind_pt_pci_irq. So I don't follow your statement about "the
toolstack to map it before attempting to bind".
Roger Pau Monne March 17, 2023, 7:48 p.m. UTC | #12
On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> > > On 17.03.2023 00:19, Stefano Stabellini wrote:
> > > > On Thu, 16 Mar 2023, Jan Beulich wrote:
> > > >> So yes, it then all boils down to that Linux-
> > > >> internal question.
> > > > 
> > > > Excellent question but we'll have to wait for Ray as he is the one with
> > > > access to the hardware. But I have this data I can share in the
> > > > meantime:
> > > > 
> > > > [    1.260378] IRQ to pin mappings:
> > > > [    1.260387] IRQ1 -> 0:1
> > > > [    1.260395] IRQ2 -> 0:2
> > > > [    1.260403] IRQ3 -> 0:3
> > > > [    1.260410] IRQ4 -> 0:4
> > > > [    1.260418] IRQ5 -> 0:5
> > > > [    1.260425] IRQ6 -> 0:6
> > > > [    1.260432] IRQ7 -> 0:7
> > > > [    1.260440] IRQ8 -> 0:8
> > > > [    1.260447] IRQ9 -> 0:9
> > > > [    1.260455] IRQ10 -> 0:10
> > > > [    1.260462] IRQ11 -> 0:11
> > > > [    1.260470] IRQ12 -> 0:12
> > > > [    1.260478] IRQ13 -> 0:13
> > > > [    1.260485] IRQ14 -> 0:14
> > > > [    1.260493] IRQ15 -> 0:15
> > > > [    1.260505] IRQ106 -> 1:8
> > > > [    1.260513] IRQ112 -> 1:4
> > > > [    1.260521] IRQ116 -> 1:13
> > > > [    1.260529] IRQ117 -> 1:14
> > > > [    1.260537] IRQ118 -> 1:15
> > > > [    1.260544] .................................... done.
> > > 
> > > And what does Linux think are IRQs 16 ... 105? Have you compared with
> > > Linux running baremetal on the same hardware?
> > 
> > So I have some emails from Ray from he time he was looking into this,
> > and on Linux dom0 PVH dmesg there is:
> > 
> > [    0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23
> > [    0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55
> > 
> > So it seems the vIO-APIC data provided by Xen to dom0 is at least
> > consistent.
> >  
> > > > And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> > > > 112 (which is the one causing issues):
> > > > 
> > > > __acpi_register_gsi->
> > > >         acpi_register_gsi_ioapic->
> > > >                 mp_map_gsi_to_irq->
> > > >                         mp_map_pin_to_irq->
> > > >                                 __irq_resolve_mapping()
> > > > 
> > > >         if (likely(data)) {
> > > >                 desc = irq_data_to_desc(data);
> > > >                 if (irq)
> > > >                         *irq = data->irq;
> > > >                 /* this IRQ is 112, IO-APIC-34 domain */
> > > >         }
> > 
> > 
> > Could this all be a result of patch 4/5 in the Linux series ("[RFC
> > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> > __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> > using PHYSDEV ops instead of doing it natively from the IO-APIC?
> > 
> > FWIW, the introduced function in that patch
> > (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> > acpi_register_gsi_ioapic() without checking if the GSI is already
> > registered, which might lead to multiple IRQs being allocated for the
> > same underlying GSI?
> 
> I understand this point and I think it needs investigating.
> 
> 
> > As I commented there, I think that approach is wrong.  If the GSI has
> > not been mapped in Xen (because dom0 hasn't unmasked the respective
> > IO-APIC pin) we should add some logic in the toolstack to map it
> > before attempting to bind.
> 
> But this statement confuses me. The toolstack doesn't get involved in
> IRQ setup for PCI devices for HVM guests?

It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
cold be removed (maybe for qemu-trad only?) or it's also required by
QEMU upstream, I would have to investigate more.  It's my
understanding it's in pci_add_dm_done() where Ray was getting the
mismatched IRQ vs GSI number.

Thanks, Roger.
Stefano Stabellini March 17, 2023, 8:55 p.m. UTC | #13
On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
> > On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> > > > On 17.03.2023 00:19, Stefano Stabellini wrote:
> > > > > On Thu, 16 Mar 2023, Jan Beulich wrote:
> > > > >> So yes, it then all boils down to that Linux-
> > > > >> internal question.
> > > > > 
> > > > > Excellent question but we'll have to wait for Ray as he is the one with
> > > > > access to the hardware. But I have this data I can share in the
> > > > > meantime:
> > > > > 
> > > > > [    1.260378] IRQ to pin mappings:
> > > > > [    1.260387] IRQ1 -> 0:1
> > > > > [    1.260395] IRQ2 -> 0:2
> > > > > [    1.260403] IRQ3 -> 0:3
> > > > > [    1.260410] IRQ4 -> 0:4
> > > > > [    1.260418] IRQ5 -> 0:5
> > > > > [    1.260425] IRQ6 -> 0:6
> > > > > [    1.260432] IRQ7 -> 0:7
> > > > > [    1.260440] IRQ8 -> 0:8
> > > > > [    1.260447] IRQ9 -> 0:9
> > > > > [    1.260455] IRQ10 -> 0:10
> > > > > [    1.260462] IRQ11 -> 0:11
> > > > > [    1.260470] IRQ12 -> 0:12
> > > > > [    1.260478] IRQ13 -> 0:13
> > > > > [    1.260485] IRQ14 -> 0:14
> > > > > [    1.260493] IRQ15 -> 0:15
> > > > > [    1.260505] IRQ106 -> 1:8
> > > > > [    1.260513] IRQ112 -> 1:4
> > > > > [    1.260521] IRQ116 -> 1:13
> > > > > [    1.260529] IRQ117 -> 1:14
> > > > > [    1.260537] IRQ118 -> 1:15
> > > > > [    1.260544] .................................... done.
> > > > 
> > > > And what does Linux think are IRQs 16 ... 105? Have you compared with
> > > > Linux running baremetal on the same hardware?
> > > 
> > > So I have some emails from Ray from he time he was looking into this,
> > > and on Linux dom0 PVH dmesg there is:
> > > 
> > > [    0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23
> > > [    0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55
> > > 
> > > So it seems the vIO-APIC data provided by Xen to dom0 is at least
> > > consistent.
> > >  
> > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> > > > > 112 (which is the one causing issues):
> > > > > 
> > > > > __acpi_register_gsi->
> > > > >         acpi_register_gsi_ioapic->
> > > > >                 mp_map_gsi_to_irq->
> > > > >                         mp_map_pin_to_irq->
> > > > >                                 __irq_resolve_mapping()
> > > > > 
> > > > >         if (likely(data)) {
> > > > >                 desc = irq_data_to_desc(data);
> > > > >                 if (irq)
> > > > >                         *irq = data->irq;
> > > > >                 /* this IRQ is 112, IO-APIC-34 domain */
> > > > >         }
> > > 
> > > 
> > > Could this all be a result of patch 4/5 in the Linux series ("[RFC
> > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> > > using PHYSDEV ops instead of doing it natively from the IO-APIC?
> > > 
> > > FWIW, the introduced function in that patch
> > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> > > acpi_register_gsi_ioapic() without checking if the GSI is already
> > > registered, which might lead to multiple IRQs being allocated for the
> > > same underlying GSI?
> > 
> > I understand this point and I think it needs investigating.
> > 
> > 
> > > As I commented there, I think that approach is wrong.  If the GSI has
> > > not been mapped in Xen (because dom0 hasn't unmasked the respective
> > > IO-APIC pin) we should add some logic in the toolstack to map it
> > > before attempting to bind.
> > 
> > But this statement confuses me. The toolstack doesn't get involved in
> > IRQ setup for PCI devices for HVM guests?
> 
> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
> to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
> cold be removed (maybe for qemu-trad only?) or it's also required by
> QEMU upstream, I would have to investigate more.

You are right. I am not certain, but it seems like a mistake in the
toolstack to me. In theory, pci_add_dm_done should only be needed for PV
guests, not for HVM guests. I am not sure. But I can see the call to
xc_physdev_map_pirq you were referring to now.


> It's my understanding it's in pci_add_dm_done() where Ray was getting
> the mismatched IRQ vs GSI number.

I think the mismatch was actually caused by the xc_physdev_map_pirq call
from QEMU, which makes sense because in any case it should happen before
the same call done by pci_add_dm_done (pci_add_dm_done is called after
sending the pci passthrough QMP command to QEMU). So the first to hit
the IRQ!=GSI problem would be QEMU.
Roger Pau Monne March 20, 2023, 3:16 p.m. UTC | #14
On Fri, Mar 17, 2023 at 01:55:08PM -0700, Stefano Stabellini wrote:
> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
> > > On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> > > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> > > > > On 17.03.2023 00:19, Stefano Stabellini wrote:
> > > > > > On Thu, 16 Mar 2023, Jan Beulich wrote:
> > > > > >> So yes, it then all boils down to that Linux-
> > > > > >> internal question.
> > > > > > 
> > > > > > Excellent question but we'll have to wait for Ray as he is the one with
> > > > > > access to the hardware. But I have this data I can share in the
> > > > > > meantime:
> > > > > > 
> > > > > > [    1.260378] IRQ to pin mappings:
> > > > > > [    1.260387] IRQ1 -> 0:1
> > > > > > [    1.260395] IRQ2 -> 0:2
> > > > > > [    1.260403] IRQ3 -> 0:3
> > > > > > [    1.260410] IRQ4 -> 0:4
> > > > > > [    1.260418] IRQ5 -> 0:5
> > > > > > [    1.260425] IRQ6 -> 0:6
> > > > > > [    1.260432] IRQ7 -> 0:7
> > > > > > [    1.260440] IRQ8 -> 0:8
> > > > > > [    1.260447] IRQ9 -> 0:9
> > > > > > [    1.260455] IRQ10 -> 0:10
> > > > > > [    1.260462] IRQ11 -> 0:11
> > > > > > [    1.260470] IRQ12 -> 0:12
> > > > > > [    1.260478] IRQ13 -> 0:13
> > > > > > [    1.260485] IRQ14 -> 0:14
> > > > > > [    1.260493] IRQ15 -> 0:15
> > > > > > [    1.260505] IRQ106 -> 1:8
> > > > > > [    1.260513] IRQ112 -> 1:4
> > > > > > [    1.260521] IRQ116 -> 1:13
> > > > > > [    1.260529] IRQ117 -> 1:14
> > > > > > [    1.260537] IRQ118 -> 1:15
> > > > > > [    1.260544] .................................... done.
> > > > > 
> > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with
> > > > > Linux running baremetal on the same hardware?
> > > > 
> > > > So I have some emails from Ray from he time he was looking into this,
> > > > and on Linux dom0 PVH dmesg there is:
> > > > 
> > > > [    0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23
> > > > [    0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55
> > > > 
> > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least
> > > > consistent.
> > > >  
> > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> > > > > > 112 (which is the one causing issues):
> > > > > > 
> > > > > > __acpi_register_gsi->
> > > > > >         acpi_register_gsi_ioapic->
> > > > > >                 mp_map_gsi_to_irq->
> > > > > >                         mp_map_pin_to_irq->
> > > > > >                                 __irq_resolve_mapping()
> > > > > > 
> > > > > >         if (likely(data)) {
> > > > > >                 desc = irq_data_to_desc(data);
> > > > > >                 if (irq)
> > > > > >                         *irq = data->irq;
> > > > > >                 /* this IRQ is 112, IO-APIC-34 domain */
> > > > > >         }
> > > > 
> > > > 
> > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC
> > > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> > > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> > > > using PHYSDEV ops instead of doing it natively from the IO-APIC?
> > > > 
> > > > FWIW, the introduced function in that patch
> > > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> > > > acpi_register_gsi_ioapic() without checking if the GSI is already
> > > > registered, which might lead to multiple IRQs being allocated for the
> > > > same underlying GSI?
> > > 
> > > I understand this point and I think it needs investigating.
> > > 
> > > 
> > > > As I commented there, I think that approach is wrong.  If the GSI has
> > > > not been mapped in Xen (because dom0 hasn't unmasked the respective
> > > > IO-APIC pin) we should add some logic in the toolstack to map it
> > > > before attempting to bind.
> > > 
> > > But this statement confuses me. The toolstack doesn't get involved in
> > > IRQ setup for PCI devices for HVM guests?
> > 
> > It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
> > to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
> > cold be removed (maybe for qemu-trad only?) or it's also required by
> > QEMU upstream, I would have to investigate more.
> 
> You are right. I am not certain, but it seems like a mistake in the
> toolstack to me. In theory, pci_add_dm_done should only be needed for PV
> guests, not for HVM guests. I am not sure. But I can see the call to
> xc_physdev_map_pirq you were referring to now.
> 
> 
> > It's my understanding it's in pci_add_dm_done() where Ray was getting
> > the mismatched IRQ vs GSI number.
> 
> I think the mismatch was actually caused by the xc_physdev_map_pirq call
> from QEMU, which makes sense because in any case it should happen before
> the same call done by pci_add_dm_done (pci_add_dm_done is called after
> sending the pci passthrough QMP command to QEMU). So the first to hit
> the IRQ!=GSI problem would be QEMU.

I've been thinking about this a bit, and I think one of the possible
issues with the current handling of GSIs in a PVH dom0 is that GSIs
don't get registered until/unless they are unmasked.  I could see this
as a problem when doing passthrough: it's possible for a GSI (iow:
vIO-APIC pin) to never get unmasked on dom0, because the device
driver(s) are using MSI(-X) interrupts instead.  However, the IO-APIC
pin must be configured for it to be able to be mapped into a domU.

A possible solution is to propagate the vIO-APIC pin configuration
trigger/polarity when dom0 writes the low part of the redirection
table entry.

The patch below enables the usage of PHYSDEVOP_{un,}map_pirq from PVH
domains (I need to assert this is secure even for domUs) and also
propagates the vIO-APIC pin trigger/polarity mode on writes to the
low part of the RTE.  Such propagation leads to the following
interrupt setup in Xen:

IRQ:   0 vec:f0 IO-APIC-edge    status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt()
IRQ:   1 vec:38 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   2 vec:a8 IO-APIC-edge    status=000 aff:{0-7}/{0-7} no_action()
IRQ:   3 vec:f1 IO-APIC-edge    status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt()
IRQ:   4 vec:40 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   5 vec:48 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   6 vec:50 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:   7 vec:58 IO-APIC-edge    status=006 aff:{0-7}/{0} mapped, unbound
IRQ:   8 vec:60 IO-APIC-edge    status=010 aff:{0}/{0} in-flight=0 d0:  8(-M-)
IRQ:   9 vec:68 IO-APIC-edge    status=010 aff:{0}/{0} in-flight=0 d0:  9(-M-)
IRQ:  10 vec:70 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  11 vec:78 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  12 vec:88 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  13 vec:90 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  14 vec:98 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  15 vec:a0 IO-APIC-edge    status=002 aff:{0-7}/{0} mapped, unbound
IRQ:  16 vec:b0 IO-APIC-edge    status=010 aff:{1}/{0-7} in-flight=0 d0: 16(-M-)
IRQ:  17 vec:b8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  18 vec:c0 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  19 vec:c8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  20 vec:d0 IO-APIC-edge    status=010 aff:{1}/{0-7} in-flight=0 d0: 20(-M-)
IRQ:  21 vec:d8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  22 vec:e0 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound
IRQ:  23 vec:e8 IO-APIC-edge    status=002 aff:{0-7}/{0-7} mapped, unbound

Note how now all GSIs on my box are setup, even when not bound to
dom0 anymore.  The output without this patch looks like:

IRQ:   0 vec:f0 IO-APIC-edge    status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt()
IRQ:   1 vec:38 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   3 vec:f1 IO-APIC-edge    status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt()
IRQ:   4 vec:40 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   5 vec:48 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   6 vec:50 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:   7 vec:58 IO-APIC-edge    status=006 aff:{0}/{0} mapped, unbound
IRQ:   8 vec:d0 IO-APIC-edge    status=010 aff:{6}/{0-7} in-flight=0 d0:  8(-M-)
IRQ:   9 vec:a8 IO-APIC-level   status=010 aff:{2}/{0-7} in-flight=0 d0:  9(-M-)
IRQ:  10 vec:70 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  11 vec:78 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  12 vec:88 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  13 vec:90 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  14 vec:98 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  15 vec:a0 IO-APIC-edge    status=002 aff:{0}/{0} mapped, unbound
IRQ:  16 vec:e0 IO-APIC-level   status=010 aff:{6}/{0-7} in-flight=0 d0: 16(-M-),d1: 16(-M-)
IRQ:  20 vec:d8 IO-APIC-level   status=010 aff:{6}/{0-7} in-flight=0 d0: 20(-M-)

Legacy IRQs (below 16) are always registered.

With the patch above I seem to be able to do PCI passthrough to an HVM
domU from a PVH dom0.

Regards, Roger.

---
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 405d0a95af..cc53a3bd12 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     {
     case PHYSDEVOP_map_pirq:
     case PHYSDEVOP_unmap_pirq:
+        break;
+
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 41e3c4d5e4..50e23a093c 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -180,9 +180,7 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig,
 
     /* Interrupt has been unmasked, bind it now. */
     ret = mp_register_gsi(gsi, trig, pol);
-    if ( ret == -EEXIST )
-        return 0;
-    if ( ret )
+    if ( ret && ret != -EEXIST )
     {
         gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
                  gsi, ret);
@@ -244,12 +242,18 @@ static void vioapic_write_redirent(
     }
     else
     {
+        int ret;
+
         unmasked = ent.fields.mask;
         /* Remote IRR and Delivery Status are read-only. */
         ent.bits = ((ent.bits >> 32) << 32) | val;
         ent.fields.delivery_status = 0;
         ent.fields.remote_irr = pent->fields.remote_irr;
         unmasked = unmasked && !ent.fields.mask;
+        ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity);
+        if ( ret && ret !=  -EEXIST )
+            gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
+                    gsi, ret);
     }
 
     *pent = ent;
Jan Beulich March 20, 2023, 3:29 p.m. UTC | #15
On 20.03.2023 16:16, Roger Pau Monné wrote:
> @@ -244,12 +242,18 @@ static void vioapic_write_redirent(
>      }
>      else
>      {
> +        int ret;
> +
>          unmasked = ent.fields.mask;
>          /* Remote IRR and Delivery Status are read-only. */
>          ent.bits = ((ent.bits >> 32) << 32) | val;
>          ent.fields.delivery_status = 0;
>          ent.fields.remote_irr = pent->fields.remote_irr;
>          unmasked = unmasked && !ent.fields.mask;
> +        ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity);
> +        if ( ret && ret !=  -EEXIST )
> +            gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
> +                    gsi, ret);
>      }

I assume this is only meant to be experimental, as I'm missing confinement
to Dom0 here. I also question this when the mask bit as set, as in that
case neither the trigger mode bit nor the polarity one can be relied upon.
At which point it would look to me as if it was necessary for Dom0 to use
a hypercall instead (which naturally would then be PHYSDEVOP_setup_gsi).

Jan
Roger Pau Monne March 20, 2023, 4:50 p.m. UTC | #16
On Mon, Mar 20, 2023 at 04:29:25PM +0100, Jan Beulich wrote:
> On 20.03.2023 16:16, Roger Pau Monné wrote:
> > @@ -244,12 +242,18 @@ static void vioapic_write_redirent(
> >      }
> >      else
> >      {
> > +        int ret;
> > +
> >          unmasked = ent.fields.mask;
> >          /* Remote IRR and Delivery Status are read-only. */
> >          ent.bits = ((ent.bits >> 32) << 32) | val;
> >          ent.fields.delivery_status = 0;
> >          ent.fields.remote_irr = pent->fields.remote_irr;
> >          unmasked = unmasked && !ent.fields.mask;
> > +        ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity);
> > +        if ( ret && ret !=  -EEXIST )
> > +            gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
> > +                    gsi, ret);
> >      }
> 
> I assume this is only meant to be experimental, as I'm missing confinement
> to Dom0 here.

Indeed.  I've attached a fixed version below, let's make sure this
doesn't influence testing.

> I also question this when the mask bit as set, as in that
> case neither the trigger mode bit nor the polarity one can be relied upon.
> At which point it would look to me as if it was necessary for Dom0 to use
> a hypercall instead (which naturally would then be PHYSDEVOP_setup_gsi).

AFAICT Linux does correctly set the trigger/polarity even when the
pins are masked, so this should be safe as a proof of concept. Let's
first figure out whether the issue is really with the lack of setup of
the IO-APIC pins.  At the end without input from Ray this is just a
wild guess.

Regards, Roger.
----
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 405d0a95af..cc53a3bd12 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     {
     case PHYSDEVOP_map_pirq:
     case PHYSDEVOP_unmap_pirq:
+        break;
+
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 41e3c4d5e4..64f7b5bcc5 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -180,9 +180,7 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig,
 
     /* Interrupt has been unmasked, bind it now. */
     ret = mp_register_gsi(gsi, trig, pol);
-    if ( ret == -EEXIST )
-        return 0;
-    if ( ret )
+    if ( ret && ret != -EEXIST )
     {
         gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n",
                  gsi, ret);
@@ -250,6 +248,16 @@ static void vioapic_write_redirent(
         ent.fields.delivery_status = 0;
         ent.fields.remote_irr = pent->fields.remote_irr;
         unmasked = unmasked && !ent.fields.mask;
+        if ( is_hardware_domain(d) )
+        {
+            int ret = mp_register_gsi(gsi, ent.fields.trig_mode,
+                                      ent.fields.polarity);
+
+            if ( ret && ret !=  -EEXIST )
+                gprintk(XENLOG_WARNING,
+                        "vioapic: error registering GSI %u: %d\n",
+                        gsi, ret);
+        }
     }
 
     *pent = ent;
Jiqian Chen July 31, 2023, 4:40 p.m. UTC | #17
Hi,

On 2023/3/18 04:55, Stefano Stabellini wrote:
> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
>> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
>>> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
>>>> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
>>>>> On 17.03.2023 00:19, Stefano Stabellini wrote:
>>>>>> On Thu, 16 Mar 2023, Jan Beulich wrote:
>>>>>>> So yes, it then all boils down to that Linux-
>>>>>>> internal question.
>>>>>>
>>>>>> Excellent question but we'll have to wait for Ray as he is the one with
>>>>>> access to the hardware. But I have this data I can share in the
>>>>>> meantime:
>>>>>>
>>>>>> [    1.260378] IRQ to pin mappings:
>>>>>> [    1.260387] IRQ1 -> 0:1
>>>>>> [    1.260395] IRQ2 -> 0:2
>>>>>> [    1.260403] IRQ3 -> 0:3
>>>>>> [    1.260410] IRQ4 -> 0:4
>>>>>> [    1.260418] IRQ5 -> 0:5
>>>>>> [    1.260425] IRQ6 -> 0:6
>>>>>> [    1.260432] IRQ7 -> 0:7
>>>>>> [    1.260440] IRQ8 -> 0:8
>>>>>> [    1.260447] IRQ9 -> 0:9
>>>>>> [    1.260455] IRQ10 -> 0:10
>>>>>> [    1.260462] IRQ11 -> 0:11
>>>>>> [    1.260470] IRQ12 -> 0:12
>>>>>> [    1.260478] IRQ13 -> 0:13
>>>>>> [    1.260485] IRQ14 -> 0:14
>>>>>> [    1.260493] IRQ15 -> 0:15
>>>>>> [    1.260505] IRQ106 -> 1:8
>>>>>> [    1.260513] IRQ112 -> 1:4
>>>>>> [    1.260521] IRQ116 -> 1:13
>>>>>> [    1.260529] IRQ117 -> 1:14
>>>>>> [    1.260537] IRQ118 -> 1:15
>>>>>> [    1.260544] .................................... done.
>>>>>
>>>>> And what does Linux think are IRQs 16 ... 105? Have you compared with
>>>>> Linux running baremetal on the same hardware?
>>>>
>>>> So I have some emails from Ray from he time he was looking into this,
>>>> and on Linux dom0 PVH dmesg there is:
>>>>
>>>> [    0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23
>>>> [    0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55
>>>>
>>>> So it seems the vIO-APIC data provided by Xen to dom0 is at least
>>>> consistent.
>>>>  
>>>>>> And I think Ray traced the point in Linux where Linux gives us an IRQ ==
>>>>>> 112 (which is the one causing issues):
>>>>>>
>>>>>> __acpi_register_gsi->
>>>>>>         acpi_register_gsi_ioapic->
>>>>>>                 mp_map_gsi_to_irq->
>>>>>>                         mp_map_pin_to_irq->
>>>>>>                                 __irq_resolve_mapping()
>>>>>>
>>>>>>         if (likely(data)) {
>>>>>>                 desc = irq_data_to_desc(data);
>>>>>>                 if (irq)
>>>>>>                         *irq = data->irq;
>>>>>>                 /* this IRQ is 112, IO-APIC-34 domain */
>>>>>>         }
>>>>
>>>>
>>>> Could this all be a result of patch 4/5 in the Linux series ("[RFC
>>>> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
>>>> __acpi_register_gsi hook is installed for PVH in order to setup GSIs
>>>> using PHYSDEV ops instead of doing it natively from the IO-APIC?
>>>>
>>>> FWIW, the introduced function in that patch
>>>> (acpi_register_gsi_xen_pvh()) seems to unconditionally call
>>>> acpi_register_gsi_ioapic() without checking if the GSI is already
>>>> registered, which might lead to multiple IRQs being allocated for the
>>>> same underlying GSI?
>>>
>>> I understand this point and I think it needs investigating.
>>>
>>>
>>>> As I commented there, I think that approach is wrong.  If the GSI has
>>>> not been mapped in Xen (because dom0 hasn't unmasked the respective
>>>> IO-APIC pin) we should add some logic in the toolstack to map it
>>>> before attempting to bind.
>>>
>>> But this statement confuses me. The toolstack doesn't get involved in
>>> IRQ setup for PCI devices for HVM guests?
>>
>> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
>> to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
>> cold be removed (maybe for qemu-trad only?) or it's also required by
>> QEMU upstream, I would have to investigate more.
> 
> You are right. I am not certain, but it seems like a mistake in the
> toolstack to me. In theory, pci_add_dm_done should only be needed for PV
> guests, not for HVM guests. I am not sure. But I can see the call to
> xc_physdev_map_pirq you were referring to now.
> 
> 
>> It's my understanding it's in pci_add_dm_done() where Ray was getting
>> the mismatched IRQ vs GSI number.
> 
> I think the mismatch was actually caused by the xc_physdev_map_pirq call
> from QEMU, which makes sense because in any case it should happen before
> the same call done by pci_add_dm_done (pci_add_dm_done is called after
> sending the pci passthrough QMP command to QEMU). So the first to hit
> the IRQ!=GSI problem would be QEMU.


Sorry for replying to you so late. And thank you all for review. I realized that your questions mainly focus on the following points: 1. Why irq is not equal with gsi? 2. Why I do the translations between irq and gsi? 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? 
Please forgive me for making a summary response first. And I am looking forward to your comments.

1. Why irq is not equal with gsi?
As far as I know, irq is dynamically allocated according to gsi, they are not necessarily equal.
When I run "sudo xl pci-assignable-add 03:00.0" to assign passthrough device(Taking dGPU on my environment as an example, which gsi is 28). It will call into acpi_register_gsi_ioapic to get irq, the callstack is: 
acpi_register_gsi_ioapic
	mp_map_gsi_to_irq
		mp_map_pin_to_irq
			irq_find_mapping(if gsi has been mapped to an irq before, it will return corresponding irq here)
			alloc_irq_from_domain
				__irq_domain_alloc_irqs
					irq_domain_alloc_descs
						__irq_alloc_descs

If you add some printings like below:
---------------------------------------------------------------------------------------------------------------------------------------------
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a868b76cd3d4..970fd461be7a 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1067,6 +1067,8 @@ static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin,
                }
        }
        mutex_unlock(&ioapic_mutex);
+       printk("cjq_debug mp_map_pin_to_irq gsi: %u, irq: %d, idx: %d, ioapic: %d, pin: %d\n",
+                       gsi, irq, idx, ioapic, pin);

        return irq;
 }
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 5db0230aa6b5..4e9613abbe96 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -786,6 +786,8 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
        start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
                                           from, cnt, 0);
        ret = -EEXIST;
+       printk("cjq_debug __irq_alloc_descs irq: %d, from: %u, cnt: %u, node: %d, start: %d, nr_irqs: %d\n",
+                       irq, from, cnt, node, start, nr_irqs);
        if (irq >=0 && start != irq)
                goto unlock;
---------------------------------------------------------------------------------------------------------------------------------------------
You will get output on PVH dom0:

[    0.181560] cjq_debug __irq_alloc_descs irq: 1, from: 1, cnt: 1, node: -1, start: 1, nr_irqs: 1096
[    0.181639] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
[    0.181641] cjq_debug __irq_alloc_descs irq: 2, from: 2, cnt: 1, node: -1, start: 2, nr_irqs: 1096
[    0.181682] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 2, idx: 0, ioapic: 0, pin: 2
[    0.181683] cjq_debug __irq_alloc_descs irq: 3, from: 3, cnt: 1, node: -1, start: 3, nr_irqs: 1096
[    0.181715] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3
[    0.181716] cjq_debug __irq_alloc_descs irq: 4, from: 4, cnt: 1, node: -1, start: 4, nr_irqs: 1096
[    0.181751] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4
[    0.181752] cjq_debug __irq_alloc_descs irq: 5, from: 5, cnt: 1, node: -1, start: 5, nr_irqs: 1096
[    0.181783] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5
[    0.181784] cjq_debug __irq_alloc_descs irq: 6, from: 6, cnt: 1, node: -1, start: 6, nr_irqs: 1096
[    0.181813] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6
[    0.181814] cjq_debug __irq_alloc_descs irq: 7, from: 7, cnt: 1, node: -1, start: 7, nr_irqs: 1096
[    0.181856] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7
[    0.181857] cjq_debug __irq_alloc_descs irq: 8, from: 8, cnt: 1, node: -1, start: 8, nr_irqs: 1096
[    0.181888] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
[    0.181889] cjq_debug __irq_alloc_descs irq: 9, from: 9, cnt: 1, node: -1, start: 9, nr_irqs: 1096
[    0.181918] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
[    0.181919] cjq_debug __irq_alloc_descs irq: 10, from: 10, cnt: 1, node: -1, start: 10, nr_irqs: 1096
[    0.181950] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10
[    0.181951] cjq_debug __irq_alloc_descs irq: 11, from: 11, cnt: 1, node: -1, start: 11, nr_irqs: 1096
[    0.181977] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11
[    0.181979] cjq_debug __irq_alloc_descs irq: 12, from: 12, cnt: 1, node: -1, start: 12, nr_irqs: 1096
[    0.182006] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12
[    0.182007] cjq_debug __irq_alloc_descs irq: 13, from: 13, cnt: 1, node: -1, start: 13, nr_irqs: 1096
[    0.182034] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
[    0.182035] cjq_debug __irq_alloc_descs irq: 14, from: 14, cnt: 1, node: -1, start: 14, nr_irqs: 1096
[    0.182066] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14
[    0.182067] cjq_debug __irq_alloc_descs irq: 15, from: 15, cnt: 1, node: -1, start: 15, nr_irqs: 1096
[    0.182095] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15
[    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096
[    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
[    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096
[    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096
[    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096
[    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096
[    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096
[    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096
[    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096
[    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096
[    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096
[    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096
[    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096
[    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096
[    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096
[    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096
[    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096
[    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096
[    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096
[    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096
[    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096
[    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096
[    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096
[    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096
[    0.198199] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096
[    0.198416] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096
[    0.198460] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096
[    0.198489] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096
[    0.198523] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096
[    0.201315] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096
[    0.202174] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096
[    0.202225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096
[    0.202259] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096
[    0.202291] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096
[    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096
[    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096
[    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096
[    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096
[    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096
[    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 63, nr_irqs: 1096
[    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 64, nr_irqs: 1096
[    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096
[    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 66, nr_irqs: 1096
[    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 67, nr_irqs: 1096
[    0.210169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 68, nr_irqs: 1096
[    0.210322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 69, nr_irqs: 1096
[    0.210370] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 70, nr_irqs: 1096
[    0.210403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 71, nr_irqs: 1096
[    0.210436] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 72, nr_irqs: 1096
[    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 73, nr_irqs: 1096
[    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 74, nr_irqs: 1096
[    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 75, nr_irqs: 1096
[    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 76, nr_irqs: 1096
[    0.214151] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 77, nr_irqs: 1096
[    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 78, nr_irqs: 1096
[    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 79, nr_irqs: 1096
[    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 80, nr_irqs: 1096
[    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 81, nr_irqs: 1096
[    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096
[    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096
[    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096
[    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096
[    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096
[    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096
[    0.222215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096
[    0.222366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096
[    0.222410] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 90, nr_irqs: 1096
[    0.222447] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 91, nr_irqs: 1096
[    0.222478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 92, nr_irqs: 1096
[    0.225490] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 93, nr_irqs: 1096
[    0.226225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 94, nr_irqs: 1096
[    0.226268] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 95, nr_irqs: 1096
[    0.226300] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 96, nr_irqs: 1096
[    0.226329] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 97, nr_irqs: 1096
[    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 98, nr_irqs: 1096
[    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 99, nr_irqs: 1096
[    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 100, nr_irqs: 1096
[    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 101, nr_irqs: 1096
[    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 102, nr_irqs: 1096
[    0.232399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 103, nr_irqs: 1096
[    0.248854] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 104, nr_irqs: 1096
[    0.250609] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 105, nr_irqs: 1096
[    0.372343] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
[    0.720950] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
[    0.721052] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
[    1.254825] cjq_debug mp_map_pin_to_irq gsi: 7, irq: -16, idx: 7, ioapic: 0, pin: 7
[    1.333081] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
[    1.375882] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 106, nr_irqs: 1096
[    1.375951] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 106, idx: -1, ioapic: 1, pin: 8
[    1.376072] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096
[    1.376121] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13
[    1.472551] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13
[    1.472697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096
[    1.472751] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14
[    1.484290] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14
[    1.768163] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096
[    1.768627] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 108, nr_irqs: 1096
[    1.769059] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 109, nr_irqs: 1096
[    1.769694] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 110, nr_irqs: 1096
[    1.770169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 111, nr_irqs: 1096
[    1.770697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 112, nr_irqs: 1096
[    1.770738] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4
[    1.770789] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 113, nr_irqs: 1096
[    1.771230] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4
[    1.771278] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 114, nr_irqs: 1096
[    2.127884] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 115, nr_irqs: 1096
[    3.207419] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 116, nr_irqs: 1096
[    3.207730] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13
[    3.208120] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 117, nr_irqs: 1096
[    3.208475] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12
[    3.208478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 118, nr_irqs: 1096
[    3.208861] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13
[    3.208933] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 119, nr_irqs: 1096
[    3.209127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 120, nr_irqs: 1096
[    3.209383] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 121, nr_irqs: 1096
[    3.209863] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 122, nr_irqs: 1096
[    3.211439] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 123, nr_irqs: 1096
[    3.211833] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 124, nr_irqs: 1096
[    3.212873] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 125, nr_irqs: 1096
[    3.243514] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 126, nr_irqs: 1096
[    3.243689] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 126, idx: -1, ioapic: 1, pin: 14
[    3.244293] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 127, nr_irqs: 1096
[    3.244534] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 128, nr_irqs: 1096
[    3.244714] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 129, nr_irqs: 1096
[    3.244911] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 130, nr_irqs: 1096
[    3.245096] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 131, nr_irqs: 1096
[    3.245633] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 132, nr_irqs: 1096
[    3.247890] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 133, nr_irqs: 1096
[    3.248192] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 134, nr_irqs: 1096
[    3.271093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 135, nr_irqs: 1096
[    3.307045] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 136, nr_irqs: 1096
[    3.307162] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 136, idx: -1, ioapic: 1, pin: 24
[    3.307223] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096
[    3.331183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096
[    3.331295] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 138, nr_irqs: 1096
[    3.331366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 139, nr_irqs: 1096
[    3.331438] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 140, nr_irqs: 1096
[    3.331511] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 141, nr_irqs: 1096
[    3.331579] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 142, nr_irqs: 1096
[    3.331646] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 143, nr_irqs: 1096
[    3.331713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 144, nr_irqs: 1096
[    3.331780] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 145, nr_irqs: 1096
[    3.331846] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 146, nr_irqs: 1096
[    3.331913] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 147, nr_irqs: 1096
[    3.331984] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 148, nr_irqs: 1096
[    3.332051] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 149, nr_irqs: 1096
[    3.332118] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 150, nr_irqs: 1096
[    3.332183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 151, nr_irqs: 1096
[    3.332252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 152, nr_irqs: 1096
[    3.332319] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 153, nr_irqs: 1096
[    8.010370] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13
[    9.545439] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12
[    9.545713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 154, nr_irqs: 1096
[    9.546034] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 155, nr_irqs: 1096
[    9.687796] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 156, nr_irqs: 1096
[    9.687979] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15
[    9.688057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 157, nr_irqs: 1096
[    9.921038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 158, nr_irqs: 1096
[    9.921210] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 158, idx: -1, ioapic: 1, pin: 5
[    9.921403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 159, nr_irqs: 1096
[    9.926373] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15
[    9.926747] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 160, nr_irqs: 1096
[    9.928201] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12
[    9.928488] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 161, nr_irqs: 1096
[   10.653915] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 162, nr_irqs: 1096
[   10.656257] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 163, nr_irqs: 1096

You can find that the allocation of irq is not always based on the value of gsi. It follows the principle of requesting first, distributing first, like gsi 32 get 106 but gsi 28 get 112. And not only acpi_register_gsi_ioapic() will call into __irq_alloc_descs, but other functions will call, even earlier.
Above output is like baremetal. So, we can get conclusion irq != gsi. See below output on linux:

[    0.105053] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
[    0.105061] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 0, idx: 0, ioapic: 0, pin: 2
[    0.105069] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3
[    0.105078] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4
[    0.105086] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5
[    0.105094] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6
[    0.105103] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7
[    0.105111] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
[    0.105119] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
[    0.105127] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10
[    0.105136] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11
[    0.105144] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12
[    0.105152] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
[    0.105160] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14
[    0.105169] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15
[    0.398134] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
[    1.169293] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
[    1.169394] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
[    1.323132] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7
[    1.345425] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
[    1.375502] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096
[    1.375575] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 24, idx: -1, ioapic: 1, pin: 8
[    1.375661] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
[    1.375705] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13
[    1.442277] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13
[    1.442393] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
[    1.442450] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14
[    1.453893] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14
[    1.456127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
[    1.734065] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096
[    1.734165] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096
[    1.734253] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096
[    1.734344] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096
[    1.734426] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096
[    1.734512] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096
[    1.734597] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096
[    1.734643] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096
[    1.734687] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096
[    1.734728] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096
[    1.735017] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096
[    1.735252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096
[    1.735467] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096
[    1.735799] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096
[    1.736024] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096
[    1.736364] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096
[    1.736406] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4
[    1.736434] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096
[    1.736701] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4
[    1.736724] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096
[    3.037123] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096
[    3.037313] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13
[    3.037515] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13
[    3.037738] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096
[    3.037959] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096
[    3.038073] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096
[    3.038154] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096
[    3.038179] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12
[    3.038277] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096
[    3.038399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096
[    3.038525] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096
[    3.038657] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096
[    3.038852] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096
[    3.052377] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096
[    3.052479] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 54, idx: -1, ioapic: 1, pin: 14
[    3.052730] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096
[    3.052840] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096
[    3.052918] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096
[    3.052987] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096
[    3.053069] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096
[    3.053139] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096
[    3.053201] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096
[    3.053260] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096
[    3.089128] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 63, nr_irqs: 1096
[    3.089310] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 63, idx: -1, ioapic: 1, pin: 24
[    3.089376] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096
[    3.103435] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096
[    3.114190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096
[    3.114346] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 66, nr_irqs: 1096
[    3.121215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 67, nr_irqs: 1096
[    3.121350] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 68, nr_irqs: 1096
[    3.121479] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 69, nr_irqs: 1096
[    3.121612] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 70, nr_irqs: 1096
[    3.121726] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 71, nr_irqs: 1096
[    3.121841] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 72, nr_irqs: 1096
[    3.121955] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 73, nr_irqs: 1096
[    3.122025] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 74, nr_irqs: 1096
[    3.122093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 75, nr_irqs: 1096
[    3.122148] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 76, nr_irqs: 1096
[    3.122203] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 77, nr_irqs: 1096
[    3.122265] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 78, nr_irqs: 1096
[    3.122322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 79, nr_irqs: 1096
[    3.122378] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 80, nr_irqs: 1096
[    3.122433] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 81, nr_irqs: 1096
[    7.838753] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13
[    9.619174] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12
[    9.619556] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096
[    9.622038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096
[    9.634900] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096
[    9.635316] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15
[    9.635405] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096
[   10.006686] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096
[   10.006823] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 86, idx: -1, ioapic: 1, pin: 5
[   10.007009] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096
[   10.008723] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15
[   10.009853] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096
[   10.010786] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12
[   10.010858] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096

2. Why I do the translations between irq and gsi?

After answering question 1, we get irq != gsi. And I found, in QEMU, (pci_qdev_realize->xen_pt_realize->xen_host_pci_device_get->xen_host_pci_get_hex_value) will get the irq number, but later, pci_qdev_realize->xen_pt_realize->xc_physdev_map_pirq requires us to pass into gsi, it will call into Xen physdev_map_pirq-> allocate_and_map_gsi_pirq to allocate pirq for gsi. And then the error occurred.
Not only that, the callback function pci_add_dm_done-> xc_physdev_map_pirq also need gsi.

So, I added the function xc_physdev_map_pirq() to translate irq to gsi, for QEMU.

And I didn't find similar functions in existing linux codes, and I think only "QEMU passthrough for Xen" need this translation, so I added it into privcmd. If you guys know any other similar functions or other more suitable places, please feel free to tell me.

3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()?

Because if you want to map a gsi for domU, it must have a mapping in dom0. See QEMU code:
pci_add_dm_done
	xc_physdev_map_pirq
	xc_domain_irq_permission
		XEN_DOMCTL_irq_permission
			pirq_access_permitted
xc_physdev_map_pirq will get the pirq which mapped from gsi, and xc_domain_irq_permission will use pirq and call into Xen. If we don't do PHYSDEVOP_map_pirq for passthrough devices on PVH dom0, then pirq_access_permitted will get a NULL irq from dom0 and get failed.

So, I added PHYSDEVOP_map_pirq for PVH dom0. But I think it is only necessary for passthrough devices to do that, instead of all devices which call __acpi_register_gsi. In next version patch, I will restrain that only passthrough devices can do PHYSDEVOP_map_pirq.

4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()?

Like Roger's comments, the gsi of passthrough device doesn't be unmasked and registered(I added printings in vioapic_hwdom_map_gsi(), and I found that it never be called for dGPU with gsi 28 in my environment).
So, I called PHYSDEVOP_setup_gsi to register gsi.
But I agree with Roger and Jan's opinion, it is wrong to do PHYSDEVOP_setup_gsi for all devices.
So, in next version patch, I will also restrain that only passthrough devices can do PHYSDEVOP_setup_gsi.
Roger Pau Monne Aug. 23, 2023, 8:57 a.m. UTC | #18
On Mon, Jul 31, 2023 at 04:40:35PM +0000, Chen, Jiqian wrote:
> Hi,
> 
> On 2023/3/18 04:55, Stefano Stabellini wrote:
> > On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> >> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
> >>> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
> >>>> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
> >>>>> On 17.03.2023 00:19, Stefano Stabellini wrote:
> >>>>>> On Thu, 16 Mar 2023, Jan Beulich wrote:
> >>>>>>> So yes, it then all boils down to that Linux-
> >>>>>>> internal question.
> >>>>>>
> >>>>>> Excellent question but we'll have to wait for Ray as he is the one with
> >>>>>> access to the hardware. But I have this data I can share in the
> >>>>>> meantime:
> >>>>>>
> >>>>>> [    1.260378] IRQ to pin mappings:
> >>>>>> [    1.260387] IRQ1 -> 0:1
> >>>>>> [    1.260395] IRQ2 -> 0:2
> >>>>>> [    1.260403] IRQ3 -> 0:3
> >>>>>> [    1.260410] IRQ4 -> 0:4
> >>>>>> [    1.260418] IRQ5 -> 0:5
> >>>>>> [    1.260425] IRQ6 -> 0:6
> >>>>>> [    1.260432] IRQ7 -> 0:7
> >>>>>> [    1.260440] IRQ8 -> 0:8
> >>>>>> [    1.260447] IRQ9 -> 0:9
> >>>>>> [    1.260455] IRQ10 -> 0:10
> >>>>>> [    1.260462] IRQ11 -> 0:11
> >>>>>> [    1.260470] IRQ12 -> 0:12
> >>>>>> [    1.260478] IRQ13 -> 0:13
> >>>>>> [    1.260485] IRQ14 -> 0:14
> >>>>>> [    1.260493] IRQ15 -> 0:15
> >>>>>> [    1.260505] IRQ106 -> 1:8
> >>>>>> [    1.260513] IRQ112 -> 1:4
> >>>>>> [    1.260521] IRQ116 -> 1:13
> >>>>>> [    1.260529] IRQ117 -> 1:14
> >>>>>> [    1.260537] IRQ118 -> 1:15
> >>>>>> [    1.260544] .................................... done.
> >>>>>
> >>>>> And what does Linux think are IRQs 16 ... 105? Have you compared with
> >>>>> Linux running baremetal on the same hardware?
> >>>>
> >>>> So I have some emails from Ray from he time he was looking into this,
> >>>> and on Linux dom0 PVH dmesg there is:
> >>>>
> >>>> [    0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23
> >>>> [    0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55
> >>>>
> >>>> So it seems the vIO-APIC data provided by Xen to dom0 is at least
> >>>> consistent.
> >>>>  
> >>>>>> And I think Ray traced the point in Linux where Linux gives us an IRQ ==
> >>>>>> 112 (which is the one causing issues):
> >>>>>>
> >>>>>> __acpi_register_gsi->
> >>>>>>         acpi_register_gsi_ioapic->
> >>>>>>                 mp_map_gsi_to_irq->
> >>>>>>                         mp_map_pin_to_irq->
> >>>>>>                                 __irq_resolve_mapping()
> >>>>>>
> >>>>>>         if (likely(data)) {
> >>>>>>                 desc = irq_data_to_desc(data);
> >>>>>>                 if (irq)
> >>>>>>                         *irq = data->irq;
> >>>>>>                 /* this IRQ is 112, IO-APIC-34 domain */
> >>>>>>         }
> >>>>
> >>>>
> >>>> Could this all be a result of patch 4/5 in the Linux series ("[RFC
> >>>> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
> >>>> __acpi_register_gsi hook is installed for PVH in order to setup GSIs
> >>>> using PHYSDEV ops instead of doing it natively from the IO-APIC?
> >>>>
> >>>> FWIW, the introduced function in that patch
> >>>> (acpi_register_gsi_xen_pvh()) seems to unconditionally call
> >>>> acpi_register_gsi_ioapic() without checking if the GSI is already
> >>>> registered, which might lead to multiple IRQs being allocated for the
> >>>> same underlying GSI?
> >>>
> >>> I understand this point and I think it needs investigating.
> >>>
> >>>
> >>>> As I commented there, I think that approach is wrong.  If the GSI has
> >>>> not been mapped in Xen (because dom0 hasn't unmasked the respective
> >>>> IO-APIC pin) we should add some logic in the toolstack to map it
> >>>> before attempting to bind.
> >>>
> >>> But this statement confuses me. The toolstack doesn't get involved in
> >>> IRQ setup for PCI devices for HVM guests?
> >>
> >> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
> >> to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
> >> cold be removed (maybe for qemu-trad only?) or it's also required by
> >> QEMU upstream, I would have to investigate more.
> > 
> > You are right. I am not certain, but it seems like a mistake in the
> > toolstack to me. In theory, pci_add_dm_done should only be needed for PV
> > guests, not for HVM guests. I am not sure. But I can see the call to
> > xc_physdev_map_pirq you were referring to now.
> > 
> > 
> >> It's my understanding it's in pci_add_dm_done() where Ray was getting
> >> the mismatched IRQ vs GSI number.
> > 
> > I think the mismatch was actually caused by the xc_physdev_map_pirq call
> > from QEMU, which makes sense because in any case it should happen before
> > the same call done by pci_add_dm_done (pci_add_dm_done is called after
> > sending the pci passthrough QMP command to QEMU). So the first to hit
> > the IRQ!=GSI problem would be QEMU.
> 
> 
> Sorry for replying to you so late. And thank you all for review. I realized that your questions mainly focus on the following points: 1. Why irq is not equal with gsi? 2. Why I do the translations between irq and gsi? 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? 
> Please forgive me for making a summary response first. And I am looking forward to your comments.

Sorry, it's been a bit since that conversation, so my recollection is
vague.

One of the questions was why acpi_register_gsi_xen_pvh() is needed.  I
think the patch that introduced it on Linux didn't have much of a
commit description.

> 1. Why irq is not equal with gsi?
> As far as I know, irq is dynamically allocated according to gsi, they are not necessarily equal.
> When I run "sudo xl pci-assignable-add 03:00.0" to assign passthrough device(Taking dGPU on my environment as an example, which gsi is 28). It will call into acpi_register_gsi_ioapic to get irq, the callstack is: 
> acpi_register_gsi_ioapic
> 	mp_map_gsi_to_irq
> 		mp_map_pin_to_irq
> 			irq_find_mapping(if gsi has been mapped to an irq before, it will return corresponding irq here)
> 			alloc_irq_from_domain
> 				__irq_domain_alloc_irqs
> 					irq_domain_alloc_descs
> 						__irq_alloc_descs

Won't you perform double GSI registrations with Xen if both
acpi_register_gsi_ioapic() and acpi_register_gsi_xen_pvh() are used?

> 
> If you add some printings like below:
> ---------------------------------------------------------------------------------------------------------------------------------------------
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index a868b76cd3d4..970fd461be7a 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -1067,6 +1067,8 @@ static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin,
>                 }
>         }
>         mutex_unlock(&ioapic_mutex);
> +       printk("cjq_debug mp_map_pin_to_irq gsi: %u, irq: %d, idx: %d, ioapic: %d, pin: %d\n",
> +                       gsi, irq, idx, ioapic, pin);
> 
>         return irq;
>  }
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 5db0230aa6b5..4e9613abbe96 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -786,6 +786,8 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
>         start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
>                                            from, cnt, 0);
>         ret = -EEXIST;
> +       printk("cjq_debug __irq_alloc_descs irq: %d, from: %u, cnt: %u, node: %d, start: %d, nr_irqs: %d\n",
> +                       irq, from, cnt, node, start, nr_irqs);
>         if (irq >=0 && start != irq)
>                 goto unlock;
> ---------------------------------------------------------------------------------------------------------------------------------------------
> You will get output on PVH dom0:
> 
> [    0.181560] cjq_debug __irq_alloc_descs irq: 1, from: 1, cnt: 1, node: -1, start: 1, nr_irqs: 1096
> [    0.181639] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
> [    0.181641] cjq_debug __irq_alloc_descs irq: 2, from: 2, cnt: 1, node: -1, start: 2, nr_irqs: 1096
> [    0.181682] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 2, idx: 0, ioapic: 0, pin: 2
> [    0.181683] cjq_debug __irq_alloc_descs irq: 3, from: 3, cnt: 1, node: -1, start: 3, nr_irqs: 1096
> [    0.181715] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3
> [    0.181716] cjq_debug __irq_alloc_descs irq: 4, from: 4, cnt: 1, node: -1, start: 4, nr_irqs: 1096
> [    0.181751] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4
> [    0.181752] cjq_debug __irq_alloc_descs irq: 5, from: 5, cnt: 1, node: -1, start: 5, nr_irqs: 1096
> [    0.181783] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5
> [    0.181784] cjq_debug __irq_alloc_descs irq: 6, from: 6, cnt: 1, node: -1, start: 6, nr_irqs: 1096
> [    0.181813] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6
> [    0.181814] cjq_debug __irq_alloc_descs irq: 7, from: 7, cnt: 1, node: -1, start: 7, nr_irqs: 1096
> [    0.181856] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7
> [    0.181857] cjq_debug __irq_alloc_descs irq: 8, from: 8, cnt: 1, node: -1, start: 8, nr_irqs: 1096
> [    0.181888] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
> [    0.181889] cjq_debug __irq_alloc_descs irq: 9, from: 9, cnt: 1, node: -1, start: 9, nr_irqs: 1096
> [    0.181918] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
> [    0.181919] cjq_debug __irq_alloc_descs irq: 10, from: 10, cnt: 1, node: -1, start: 10, nr_irqs: 1096
> [    0.181950] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10
> [    0.181951] cjq_debug __irq_alloc_descs irq: 11, from: 11, cnt: 1, node: -1, start: 11, nr_irqs: 1096
> [    0.181977] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11
> [    0.181979] cjq_debug __irq_alloc_descs irq: 12, from: 12, cnt: 1, node: -1, start: 12, nr_irqs: 1096
> [    0.182006] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12
> [    0.182007] cjq_debug __irq_alloc_descs irq: 13, from: 13, cnt: 1, node: -1, start: 13, nr_irqs: 1096
> [    0.182034] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
> [    0.182035] cjq_debug __irq_alloc_descs irq: 14, from: 14, cnt: 1, node: -1, start: 14, nr_irqs: 1096
> [    0.182066] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14
> [    0.182067] cjq_debug __irq_alloc_descs irq: 15, from: 15, cnt: 1, node: -1, start: 15, nr_irqs: 1096
> [    0.182095] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15
> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096
> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096
> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096
> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096
> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096
> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096
> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096
> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096
> [    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096
> [    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096
> [    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096
> [    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096
> [    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096
> [    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096
> [    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096
> [    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096
> [    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096
> [    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096
> [    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096
> [    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096
> [    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096
> [    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096
> [    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096
> [    0.198199] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096
> [    0.198416] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096
> [    0.198460] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096
> [    0.198489] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096
> [    0.198523] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096
> [    0.201315] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096
> [    0.202174] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096
> [    0.202225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096
> [    0.202259] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096
> [    0.202291] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096
> [    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096
> [    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096
> [    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096
> [    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096
> [    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096
> [    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 63, nr_irqs: 1096
> [    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 64, nr_irqs: 1096
> [    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096
> [    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 66, nr_irqs: 1096
> [    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 67, nr_irqs: 1096
> [    0.210169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 68, nr_irqs: 1096
> [    0.210322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 69, nr_irqs: 1096
> [    0.210370] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 70, nr_irqs: 1096
> [    0.210403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 71, nr_irqs: 1096
> [    0.210436] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 72, nr_irqs: 1096
> [    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 73, nr_irqs: 1096
> [    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 74, nr_irqs: 1096
> [    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 75, nr_irqs: 1096
> [    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 76, nr_irqs: 1096
> [    0.214151] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 77, nr_irqs: 1096
> [    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 78, nr_irqs: 1096
> [    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 79, nr_irqs: 1096
> [    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 80, nr_irqs: 1096
> [    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 81, nr_irqs: 1096
> [    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096
> [    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096
> [    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096
> [    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096
> [    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096
> [    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096
> [    0.222215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096
> [    0.222366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096
> [    0.222410] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 90, nr_irqs: 1096
> [    0.222447] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 91, nr_irqs: 1096
> [    0.222478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 92, nr_irqs: 1096
> [    0.225490] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 93, nr_irqs: 1096
> [    0.226225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 94, nr_irqs: 1096
> [    0.226268] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 95, nr_irqs: 1096
> [    0.226300] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 96, nr_irqs: 1096
> [    0.226329] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 97, nr_irqs: 1096
> [    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 98, nr_irqs: 1096
> [    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 99, nr_irqs: 1096
> [    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 100, nr_irqs: 1096
> [    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 101, nr_irqs: 1096
> [    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 102, nr_irqs: 1096
> [    0.232399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 103, nr_irqs: 1096
> [    0.248854] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 104, nr_irqs: 1096
> [    0.250609] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 105, nr_irqs: 1096
> [    0.372343] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
> [    0.720950] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
> [    0.721052] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
> [    1.254825] cjq_debug mp_map_pin_to_irq gsi: 7, irq: -16, idx: 7, ioapic: 0, pin: 7
> [    1.333081] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
> [    1.375882] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 106, nr_irqs: 1096
> [    1.375951] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 106, idx: -1, ioapic: 1, pin: 8
> [    1.376072] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096
> [    1.376121] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13
> [    1.472551] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13
> [    1.472697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096
> [    1.472751] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14
> [    1.484290] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14
> [    1.768163] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096
> [    1.768627] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 108, nr_irqs: 1096
> [    1.769059] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 109, nr_irqs: 1096
> [    1.769694] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 110, nr_irqs: 1096
> [    1.770169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 111, nr_irqs: 1096
> [    1.770697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 112, nr_irqs: 1096
> [    1.770738] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4
> [    1.770789] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 113, nr_irqs: 1096
> [    1.771230] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4
> [    1.771278] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 114, nr_irqs: 1096
> [    2.127884] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 115, nr_irqs: 1096
> [    3.207419] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 116, nr_irqs: 1096
> [    3.207730] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13
> [    3.208120] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 117, nr_irqs: 1096
> [    3.208475] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12
> [    3.208478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 118, nr_irqs: 1096
> [    3.208861] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13
> [    3.208933] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 119, nr_irqs: 1096
> [    3.209127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 120, nr_irqs: 1096
> [    3.209383] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 121, nr_irqs: 1096
> [    3.209863] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 122, nr_irqs: 1096
> [    3.211439] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 123, nr_irqs: 1096
> [    3.211833] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 124, nr_irqs: 1096
> [    3.212873] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 125, nr_irqs: 1096
> [    3.243514] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 126, nr_irqs: 1096
> [    3.243689] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 126, idx: -1, ioapic: 1, pin: 14
> [    3.244293] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 127, nr_irqs: 1096
> [    3.244534] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 128, nr_irqs: 1096
> [    3.244714] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 129, nr_irqs: 1096
> [    3.244911] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 130, nr_irqs: 1096
> [    3.245096] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 131, nr_irqs: 1096
> [    3.245633] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 132, nr_irqs: 1096
> [    3.247890] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 133, nr_irqs: 1096
> [    3.248192] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 134, nr_irqs: 1096
> [    3.271093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 135, nr_irqs: 1096
> [    3.307045] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 136, nr_irqs: 1096
> [    3.307162] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 136, idx: -1, ioapic: 1, pin: 24
> [    3.307223] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096
> [    3.331183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096
> [    3.331295] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 138, nr_irqs: 1096
> [    3.331366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 139, nr_irqs: 1096
> [    3.331438] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 140, nr_irqs: 1096
> [    3.331511] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 141, nr_irqs: 1096
> [    3.331579] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 142, nr_irqs: 1096
> [    3.331646] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 143, nr_irqs: 1096
> [    3.331713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 144, nr_irqs: 1096
> [    3.331780] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 145, nr_irqs: 1096
> [    3.331846] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 146, nr_irqs: 1096
> [    3.331913] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 147, nr_irqs: 1096
> [    3.331984] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 148, nr_irqs: 1096
> [    3.332051] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 149, nr_irqs: 1096
> [    3.332118] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 150, nr_irqs: 1096
> [    3.332183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 151, nr_irqs: 1096
> [    3.332252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 152, nr_irqs: 1096
> [    3.332319] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 153, nr_irqs: 1096
> [    8.010370] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13
> [    9.545439] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12
> [    9.545713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 154, nr_irqs: 1096
> [    9.546034] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 155, nr_irqs: 1096
> [    9.687796] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 156, nr_irqs: 1096
> [    9.687979] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15
> [    9.688057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 157, nr_irqs: 1096
> [    9.921038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 158, nr_irqs: 1096
> [    9.921210] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 158, idx: -1, ioapic: 1, pin: 5
> [    9.921403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 159, nr_irqs: 1096
> [    9.926373] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15
> [    9.926747] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 160, nr_irqs: 1096
> [    9.928201] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12
> [    9.928488] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 161, nr_irqs: 1096
> [   10.653915] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 162, nr_irqs: 1096
> [   10.656257] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 163, nr_irqs: 1096
> 
> You can find that the allocation of irq is not always based on the value of gsi. It follows the principle of requesting first, distributing first, like gsi 32 get 106 but gsi 28 get 112. And not only acpi_register_gsi_ioapic() will call into __irq_alloc_descs, but other functions will call, even earlier.
> Above output is like baremetal. So, we can get conclusion irq != gsi. See below output on linux:

It does seem weird to me that it does identity map legacy IRQs (<16),
but then for GSI >= 16 it starts assigning IRQs in the 100 range.

What uses the IRQ range [24, 105]?

Also IIRC on a PV dom0 GSIs are identity mapped to IRQs on Linux?  Or
maybe that's just a side effect of GSIs being identity mapped into
PIRQs by Xen?

> [    0.105053] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
> [    0.105061] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 0, idx: 0, ioapic: 0, pin: 2
> [    0.105069] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3
> [    0.105078] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4
> [    0.105086] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5
> [    0.105094] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6
> [    0.105103] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7
> [    0.105111] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
> [    0.105119] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
> [    0.105127] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10
> [    0.105136] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11
> [    0.105144] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12
> [    0.105152] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
> [    0.105160] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14
> [    0.105169] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15
> [    0.398134] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
> [    1.169293] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
> [    1.169394] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
> [    1.323132] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7
> [    1.345425] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
> [    1.375502] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096
> [    1.375575] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 24, idx: -1, ioapic: 1, pin: 8
> [    1.375661] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
> [    1.375705] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13
> [    1.442277] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13
> [    1.442393] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
> [    1.442450] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14
> [    1.453893] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14
> [    1.456127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
> [    1.734065] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096
> [    1.734165] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096
> [    1.734253] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096
> [    1.734344] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096
> [    1.734426] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096
> [    1.734512] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096
> [    1.734597] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096
> [    1.734643] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096
> [    1.734687] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096
> [    1.734728] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096
> [    1.735017] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096
> [    1.735252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096
> [    1.735467] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096
> [    1.735799] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096
> [    1.736024] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096
> [    1.736364] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096
> [    1.736406] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4
> [    1.736434] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096
> [    1.736701] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4
> [    1.736724] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096
> [    3.037123] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096
> [    3.037313] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13
> [    3.037515] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13
> [    3.037738] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096
> [    3.037959] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096
> [    3.038073] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096
> [    3.038154] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096
> [    3.038179] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12
> [    3.038277] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096
> [    3.038399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096
> [    3.038525] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096
> [    3.038657] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096
> [    3.038852] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096
> [    3.052377] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096
> [    3.052479] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 54, idx: -1, ioapic: 1, pin: 14
> [    3.052730] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096
> [    3.052840] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096
> [    3.052918] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096
> [    3.052987] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096
> [    3.053069] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096
> [    3.053139] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096
> [    3.053201] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096
> [    3.053260] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096
> [    3.089128] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 63, nr_irqs: 1096
> [    3.089310] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 63, idx: -1, ioapic: 1, pin: 24
> [    3.089376] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096
> [    3.103435] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096
> [    3.114190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096
> [    3.114346] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 66, nr_irqs: 1096
> [    3.121215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 67, nr_irqs: 1096
> [    3.121350] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 68, nr_irqs: 1096
> [    3.121479] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 69, nr_irqs: 1096
> [    3.121612] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 70, nr_irqs: 1096
> [    3.121726] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 71, nr_irqs: 1096
> [    3.121841] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 72, nr_irqs: 1096
> [    3.121955] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 73, nr_irqs: 1096
> [    3.122025] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 74, nr_irqs: 1096
> [    3.122093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 75, nr_irqs: 1096
> [    3.122148] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 76, nr_irqs: 1096
> [    3.122203] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 77, nr_irqs: 1096
> [    3.122265] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 78, nr_irqs: 1096
> [    3.122322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 79, nr_irqs: 1096
> [    3.122378] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 80, nr_irqs: 1096
> [    3.122433] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 81, nr_irqs: 1096
> [    7.838753] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13
> [    9.619174] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12
> [    9.619556] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096
> [    9.622038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096
> [    9.634900] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096
> [    9.635316] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15
> [    9.635405] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096
> [   10.006686] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096
> [   10.006823] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 86, idx: -1, ioapic: 1, pin: 5
> [   10.007009] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096
> [   10.008723] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15
> [   10.009853] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096
> [   10.010786] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12
> [   10.010858] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096
> 
> 2. Why I do the translations between irq and gsi?
> 
> After answering question 1, we get irq != gsi. And I found, in QEMU, (pci_qdev_realize->xen_pt_realize->xen_host_pci_device_get->xen_host_pci_get_hex_value) will get the irq number, but later, pci_qdev_realize->xen_pt_realize->xc_physdev_map_pirq requires us to pass into gsi,

So that's quite a difference.  For some reason on a PV dom0
xen_host_pci_get_hex_value will return the IRQ that's identity mapped
to the GSI.

Is that because a PV dom0 will use acpi_register_gsi_xen() instead of
acpi_register_gsi_ioapic()?

> it will call into Xen physdev_map_pirq-> allocate_and_map_gsi_pirq to allocate pirq for gsi. And then the error occurred.
> Not only that, the callback function pci_add_dm_done-> xc_physdev_map_pirq also need gsi.
> 
> So, I added the function xc_physdev_map_pirq() to translate irq to gsi, for QEMU.
> 
> And I didn't find similar functions in existing linux codes, and I think only "QEMU passthrough for Xen" need this translation, so I added it into privcmd. If you guys know any other similar functions or other more suitable places, please feel free to tell me.
> 
> 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()?
> 
> Because if you want to map a gsi for domU, it must have a mapping in dom0. See QEMU code:
> pci_add_dm_done
> 	xc_physdev_map_pirq
> 	xc_domain_irq_permission
> 		XEN_DOMCTL_irq_permission
> 			pirq_access_permitted
> xc_physdev_map_pirq will get the pirq which mapped from gsi, and xc_domain_irq_permission will use pirq and call into Xen. If we don't do PHYSDEVOP_map_pirq for passthrough devices on PVH dom0, then pirq_access_permitted will get a NULL irq from dom0 and get failed.

I'm not sure of this specific case, but we shouldn't attempt to fit
the same exact PCI pass through workflow that a PV dom0 uses into a
PVH dom0.  IOW: it might make sense to diverge some paths in order to
avoid importing PV specific concepts into PVH without a reason.

> So, I added PHYSDEVOP_map_pirq for PVH dom0. But I think it is only necessary for passthrough devices to do that, instead of all devices which call __acpi_register_gsi. In next version patch, I will restrain that only passthrough devices can do PHYSDEVOP_map_pirq.
> 
> 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()?
> 
> Like Roger's comments, the gsi of passthrough device doesn't be unmasked and registered(I added printings in vioapic_hwdom_map_gsi(), and I found that it never be called for dGPU with gsi 28 in my environment).
> So, I called PHYSDEVOP_setup_gsi to register gsi.
> But I agree with Roger and Jan's opinion, it is wrong to do PHYSDEVOP_setup_gsi for all devices.
> So, in next version patch, I will also restrain that only passthrough devices can do PHYSDEVOP_setup_gsi.

Right, given how long it's been since the last series, I think we need
a new series posted in order to see how this looks now.

Thanks, Roger.
Jiqian Chen Aug. 31, 2023, 8:56 a.m. UTC | #19
Thanks Roger, we will send a new series after the freezing time of Xen release 4.18.

On 2023/8/23 16:57, Roger Pau Monné wrote:
> On Mon, Jul 31, 2023 at 04:40:35PM +0000, Chen, Jiqian wrote:
>> Hi,
>>
>> On 2023/3/18 04:55, Stefano Stabellini wrote:
>>> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
>>>> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote:
>>>>> On Fri, 17 Mar 2023, Roger Pau Monné wrote:
>>>>>> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote:
>>>>>>> On 17.03.2023 00:19, Stefano Stabellini wrote:
>>>>>>>> On Thu, 16 Mar 2023, Jan Beulich wrote:
>>>>>>>>> So yes, it then all boils down to that Linux-
>>>>>>>>> internal question.
>>>>>>>>
>>>>>>>> Excellent question but we'll have to wait for Ray as he is the one with
>>>>>>>> access to the hardware. But I have this data I can share in the
>>>>>>>> meantime:
>>>>>>>>
>>>>>>>> [    1.260378] IRQ to pin mappings:
>>>>>>>> [    1.260387] IRQ1 -> 0:1
>>>>>>>> [    1.260395] IRQ2 -> 0:2
>>>>>>>> [    1.260403] IRQ3 -> 0:3
>>>>>>>> [    1.260410] IRQ4 -> 0:4
>>>>>>>> [    1.260418] IRQ5 -> 0:5
>>>>>>>> [    1.260425] IRQ6 -> 0:6
>>>>>>>> [    1.260432] IRQ7 -> 0:7
>>>>>>>> [    1.260440] IRQ8 -> 0:8
>>>>>>>> [    1.260447] IRQ9 -> 0:9
>>>>>>>> [    1.260455] IRQ10 -> 0:10
>>>>>>>> [    1.260462] IRQ11 -> 0:11
>>>>>>>> [    1.260470] IRQ12 -> 0:12
>>>>>>>> [    1.260478] IRQ13 -> 0:13
>>>>>>>> [    1.260485] IRQ14 -> 0:14
>>>>>>>> [    1.260493] IRQ15 -> 0:15
>>>>>>>> [    1.260505] IRQ106 -> 1:8
>>>>>>>> [    1.260513] IRQ112 -> 1:4
>>>>>>>> [    1.260521] IRQ116 -> 1:13
>>>>>>>> [    1.260529] IRQ117 -> 1:14
>>>>>>>> [    1.260537] IRQ118 -> 1:15
>>>>>>>> [    1.260544] .................................... done.
>>>>>>>
>>>>>>> And what does Linux think are IRQs 16 ... 105? Have you compared with
>>>>>>> Linux running baremetal on the same hardware?
>>>>>>
>>>>>> So I have some emails from Ray from he time he was looking into this,
>>>>>> and on Linux dom0 PVH dmesg there is:
>>>>>>
>>>>>> [    0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23
>>>>>> [    0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55
>>>>>>
>>>>>> So it seems the vIO-APIC data provided by Xen to dom0 is at least
>>>>>> consistent.
>>>>>>  
>>>>>>>> And I think Ray traced the point in Linux where Linux gives us an IRQ ==
>>>>>>>> 112 (which is the one causing issues):
>>>>>>>>
>>>>>>>> __acpi_register_gsi->
>>>>>>>>         acpi_register_gsi_ioapic->
>>>>>>>>                 mp_map_gsi_to_irq->
>>>>>>>>                         mp_map_pin_to_irq->
>>>>>>>>                                 __irq_resolve_mapping()
>>>>>>>>
>>>>>>>>         if (likely(data)) {
>>>>>>>>                 desc = irq_data_to_desc(data);
>>>>>>>>                 if (irq)
>>>>>>>>                         *irq = data->irq;
>>>>>>>>                 /* this IRQ is 112, IO-APIC-34 domain */
>>>>>>>>         }
>>>>>>
>>>>>>
>>>>>> Could this all be a result of patch 4/5 in the Linux series ("[RFC
>>>>>> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different
>>>>>> __acpi_register_gsi hook is installed for PVH in order to setup GSIs
>>>>>> using PHYSDEV ops instead of doing it natively from the IO-APIC?
>>>>>>
>>>>>> FWIW, the introduced function in that patch
>>>>>> (acpi_register_gsi_xen_pvh()) seems to unconditionally call
>>>>>> acpi_register_gsi_ioapic() without checking if the GSI is already
>>>>>> registered, which might lead to multiple IRQs being allocated for the
>>>>>> same underlying GSI?
>>>>>
>>>>> I understand this point and I think it needs investigating.
>>>>>
>>>>>
>>>>>> As I commented there, I think that approach is wrong.  If the GSI has
>>>>>> not been mapped in Xen (because dom0 hasn't unmasked the respective
>>>>>> IO-APIC pin) we should add some logic in the toolstack to map it
>>>>>> before attempting to bind.
>>>>>
>>>>> But this statement confuses me. The toolstack doesn't get involved in
>>>>> IRQ setup for PCI devices for HVM guests?
>>>>
>>>> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call
>>>> to xc_physdev_map_pirq().  I'm not sure whether that's a remnant that
>>>> cold be removed (maybe for qemu-trad only?) or it's also required by
>>>> QEMU upstream, I would have to investigate more.
>>>
>>> You are right. I am not certain, but it seems like a mistake in the
>>> toolstack to me. In theory, pci_add_dm_done should only be needed for PV
>>> guests, not for HVM guests. I am not sure. But I can see the call to
>>> xc_physdev_map_pirq you were referring to now.
>>>
>>>
>>>> It's my understanding it's in pci_add_dm_done() where Ray was getting
>>>> the mismatched IRQ vs GSI number.
>>>
>>> I think the mismatch was actually caused by the xc_physdev_map_pirq call
>>> from QEMU, which makes sense because in any case it should happen before
>>> the same call done by pci_add_dm_done (pci_add_dm_done is called after
>>> sending the pci passthrough QMP command to QEMU). So the first to hit
>>> the IRQ!=GSI problem would be QEMU.
>>
>>
>> Sorry for replying to you so late. And thank you all for review. I realized that your questions mainly focus on the following points: 1. Why irq is not equal with gsi? 2. Why I do the translations between irq and gsi? 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? 
>> Please forgive me for making a summary response first. And I am looking forward to your comments.
> 
> Sorry, it's been a bit since that conversation, so my recollection is
> vague.
> 
> One of the questions was why acpi_register_gsi_xen_pvh() is needed.  I
> think the patch that introduced it on Linux didn't have much of a
> commit description.
PVH and baremetal both use acpi_register_gsi_ioapic  to alloc irq for gsi. And I add function acpi_register_gsi_xen_pvh to replace acpi_register_gsi_ioapic for PVH, and then I can do something special for PVH, like map_pirq, setup_gsi, etc.

> 
>> 1. Why irq is not equal with gsi?
>> As far as I know, irq is dynamically allocated according to gsi, they are not necessarily equal.
>> When I run "sudo xl pci-assignable-add 03:00.0" to assign passthrough device(Taking dGPU on my environment as an example, which gsi is 28). It will call into acpi_register_gsi_ioapic to get irq, the callstack is: 
>> acpi_register_gsi_ioapic
>> 	mp_map_gsi_to_irq
>> 		mp_map_pin_to_irq
>> 			irq_find_mapping(if gsi has been mapped to an irq before, it will return corresponding irq here)
>> 			alloc_irq_from_domain
>> 				__irq_domain_alloc_irqs
>> 					irq_domain_alloc_descs
>> 						__irq_alloc_descs
> 
> Won't you perform double GSI registrations with Xen if both
> acpi_register_gsi_ioapic() and acpi_register_gsi_xen_pvh() are used?
In the original PVH code, __acpi_register_gsi is set acpi_register_gsi_ioapic in callstack start_kernel->setup_arch->acpi_boot_init->acpi_process_madt->acpi_set_irq_model_ioapic.
In my code, I use acpi_register_gsi_xen_pvh to replace acpi_register_gsi_ioapic in call stack start_kernel-> init_IRQ-> xen_init_IRQ-> pci_xen_pvh_init.
So acpi_register_gsi_ioapic will be called only once.

> 
>>
>> If you add some printings like below:
>> ---------------------------------------------------------------------------------------------------------------------------------------------
>> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
>> index a868b76cd3d4..970fd461be7a 100644
>> --- a/arch/x86/kernel/apic/io_apic.c
>> +++ b/arch/x86/kernel/apic/io_apic.c
>> @@ -1067,6 +1067,8 @@ static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin,
>>                 }
>>         }
>>         mutex_unlock(&ioapic_mutex);
>> +       printk("cjq_debug mp_map_pin_to_irq gsi: %u, irq: %d, idx: %d, ioapic: %d, pin: %d\n",
>> +                       gsi, irq, idx, ioapic, pin);
>>
>>         return irq;
>>  }
>> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
>> index 5db0230aa6b5..4e9613abbe96 100644
>> --- a/kernel/irq/irqdesc.c
>> +++ b/kernel/irq/irqdesc.c
>> @@ -786,6 +786,8 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
>>         start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
>>                                            from, cnt, 0);
>>         ret = -EEXIST;
>> +       printk("cjq_debug __irq_alloc_descs irq: %d, from: %u, cnt: %u, node: %d, start: %d, nr_irqs: %d\n",
>> +                       irq, from, cnt, node, start, nr_irqs);
>>         if (irq >=0 && start != irq)
>>                 goto unlock;
>> ---------------------------------------------------------------------------------------------------------------------------------------------
>> You will get output on PVH dom0:
>>
>> [    0.181560] cjq_debug __irq_alloc_descs irq: 1, from: 1, cnt: 1, node: -1, start: 1, nr_irqs: 1096
>> [    0.181639] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
>> [    0.181641] cjq_debug __irq_alloc_descs irq: 2, from: 2, cnt: 1, node: -1, start: 2, nr_irqs: 1096
>> [    0.181682] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 2, idx: 0, ioapic: 0, pin: 2
>> [    0.181683] cjq_debug __irq_alloc_descs irq: 3, from: 3, cnt: 1, node: -1, start: 3, nr_irqs: 1096
>> [    0.181715] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3
>> [    0.181716] cjq_debug __irq_alloc_descs irq: 4, from: 4, cnt: 1, node: -1, start: 4, nr_irqs: 1096
>> [    0.181751] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4
>> [    0.181752] cjq_debug __irq_alloc_descs irq: 5, from: 5, cnt: 1, node: -1, start: 5, nr_irqs: 1096
>> [    0.181783] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5
>> [    0.181784] cjq_debug __irq_alloc_descs irq: 6, from: 6, cnt: 1, node: -1, start: 6, nr_irqs: 1096
>> [    0.181813] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6
>> [    0.181814] cjq_debug __irq_alloc_descs irq: 7, from: 7, cnt: 1, node: -1, start: 7, nr_irqs: 1096
>> [    0.181856] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7
>> [    0.181857] cjq_debug __irq_alloc_descs irq: 8, from: 8, cnt: 1, node: -1, start: 8, nr_irqs: 1096
>> [    0.181888] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
>> [    0.181889] cjq_debug __irq_alloc_descs irq: 9, from: 9, cnt: 1, node: -1, start: 9, nr_irqs: 1096
>> [    0.181918] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
>> [    0.181919] cjq_debug __irq_alloc_descs irq: 10, from: 10, cnt: 1, node: -1, start: 10, nr_irqs: 1096
>> [    0.181950] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10
>> [    0.181951] cjq_debug __irq_alloc_descs irq: 11, from: 11, cnt: 1, node: -1, start: 11, nr_irqs: 1096
>> [    0.181977] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11
>> [    0.181979] cjq_debug __irq_alloc_descs irq: 12, from: 12, cnt: 1, node: -1, start: 12, nr_irqs: 1096
>> [    0.182006] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12
>> [    0.182007] cjq_debug __irq_alloc_descs irq: 13, from: 13, cnt: 1, node: -1, start: 13, nr_irqs: 1096
>> [    0.182034] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
>> [    0.182035] cjq_debug __irq_alloc_descs irq: 14, from: 14, cnt: 1, node: -1, start: 14, nr_irqs: 1096
>> [    0.182066] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14
>> [    0.182067] cjq_debug __irq_alloc_descs irq: 15, from: 15, cnt: 1, node: -1, start: 15, nr_irqs: 1096
>> [    0.182095] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15
>> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096
>> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
>> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096
>> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096
>> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096
>> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096
>> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096
>> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096
>> [    0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096
>> [    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096
>> [    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096
>> [    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096
>> [    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096
>> [    0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096
>> [    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096
>> [    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096
>> [    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096
>> [    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096
>> [    0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096
>> [    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096
>> [    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096
>> [    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096
>> [    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096
>> [    0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096
>> [    0.198199] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096
>> [    0.198416] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096
>> [    0.198460] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096
>> [    0.198489] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096
>> [    0.198523] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096
>> [    0.201315] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096
>> [    0.202174] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096
>> [    0.202225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096
>> [    0.202259] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096
>> [    0.202291] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096
>> [    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096
>> [    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096
>> [    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096
>> [    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096
>> [    0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096
>> [    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 63, nr_irqs: 1096
>> [    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 64, nr_irqs: 1096
>> [    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096
>> [    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 66, nr_irqs: 1096
>> [    0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 67, nr_irqs: 1096
>> [    0.210169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 68, nr_irqs: 1096
>> [    0.210322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 69, nr_irqs: 1096
>> [    0.210370] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 70, nr_irqs: 1096
>> [    0.210403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 71, nr_irqs: 1096
>> [    0.210436] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 72, nr_irqs: 1096
>> [    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 73, nr_irqs: 1096
>> [    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 74, nr_irqs: 1096
>> [    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 75, nr_irqs: 1096
>> [    0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 76, nr_irqs: 1096
>> [    0.214151] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 77, nr_irqs: 1096
>> [    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 78, nr_irqs: 1096
>> [    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 79, nr_irqs: 1096
>> [    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 80, nr_irqs: 1096
>> [    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 81, nr_irqs: 1096
>> [    0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096
>> [    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096
>> [    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096
>> [    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096
>> [    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096
>> [    0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096
>> [    0.222215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096
>> [    0.222366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096
>> [    0.222410] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 90, nr_irqs: 1096
>> [    0.222447] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 91, nr_irqs: 1096
>> [    0.222478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 92, nr_irqs: 1096
>> [    0.225490] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 93, nr_irqs: 1096
>> [    0.226225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 94, nr_irqs: 1096
>> [    0.226268] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 95, nr_irqs: 1096
>> [    0.226300] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 96, nr_irqs: 1096
>> [    0.226329] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 97, nr_irqs: 1096
>> [    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 98, nr_irqs: 1096
>> [    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 99, nr_irqs: 1096
>> [    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 100, nr_irqs: 1096
>> [    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 101, nr_irqs: 1096
>> [    0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 102, nr_irqs: 1096
>> [    0.232399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 103, nr_irqs: 1096
>> [    0.248854] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 104, nr_irqs: 1096
>> [    0.250609] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 105, nr_irqs: 1096
>> [    0.372343] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
>> [    0.720950] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
>> [    0.721052] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
>> [    1.254825] cjq_debug mp_map_pin_to_irq gsi: 7, irq: -16, idx: 7, ioapic: 0, pin: 7
>> [    1.333081] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
>> [    1.375882] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 106, nr_irqs: 1096
>> [    1.375951] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 106, idx: -1, ioapic: 1, pin: 8
>> [    1.376072] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096
>> [    1.376121] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13
>> [    1.472551] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13
>> [    1.472697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096
>> [    1.472751] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14
>> [    1.484290] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14
>> [    1.768163] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096
>> [    1.768627] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 108, nr_irqs: 1096
>> [    1.769059] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 109, nr_irqs: 1096
>> [    1.769694] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 110, nr_irqs: 1096
>> [    1.770169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 111, nr_irqs: 1096
>> [    1.770697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 112, nr_irqs: 1096
>> [    1.770738] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4
>> [    1.770789] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 113, nr_irqs: 1096
>> [    1.771230] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4
>> [    1.771278] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 114, nr_irqs: 1096
>> [    2.127884] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 115, nr_irqs: 1096
>> [    3.207419] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 116, nr_irqs: 1096
>> [    3.207730] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13
>> [    3.208120] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 117, nr_irqs: 1096
>> [    3.208475] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12
>> [    3.208478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 118, nr_irqs: 1096
>> [    3.208861] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13
>> [    3.208933] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 119, nr_irqs: 1096
>> [    3.209127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 120, nr_irqs: 1096
>> [    3.209383] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 121, nr_irqs: 1096
>> [    3.209863] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 122, nr_irqs: 1096
>> [    3.211439] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 123, nr_irqs: 1096
>> [    3.211833] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 124, nr_irqs: 1096
>> [    3.212873] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 125, nr_irqs: 1096
>> [    3.243514] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 126, nr_irqs: 1096
>> [    3.243689] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 126, idx: -1, ioapic: 1, pin: 14
>> [    3.244293] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 127, nr_irqs: 1096
>> [    3.244534] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 128, nr_irqs: 1096
>> [    3.244714] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 129, nr_irqs: 1096
>> [    3.244911] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 130, nr_irqs: 1096
>> [    3.245096] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 131, nr_irqs: 1096
>> [    3.245633] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 132, nr_irqs: 1096
>> [    3.247890] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 133, nr_irqs: 1096
>> [    3.248192] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 134, nr_irqs: 1096
>> [    3.271093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 135, nr_irqs: 1096
>> [    3.307045] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 136, nr_irqs: 1096
>> [    3.307162] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 136, idx: -1, ioapic: 1, pin: 24
>> [    3.307223] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096
>> [    3.331183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096
>> [    3.331295] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 138, nr_irqs: 1096
>> [    3.331366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 139, nr_irqs: 1096
>> [    3.331438] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 140, nr_irqs: 1096
>> [    3.331511] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 141, nr_irqs: 1096
>> [    3.331579] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 142, nr_irqs: 1096
>> [    3.331646] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 143, nr_irqs: 1096
>> [    3.331713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 144, nr_irqs: 1096
>> [    3.331780] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 145, nr_irqs: 1096
>> [    3.331846] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 146, nr_irqs: 1096
>> [    3.331913] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 147, nr_irqs: 1096
>> [    3.331984] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 148, nr_irqs: 1096
>> [    3.332051] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 149, nr_irqs: 1096
>> [    3.332118] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 150, nr_irqs: 1096
>> [    3.332183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 151, nr_irqs: 1096
>> [    3.332252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 152, nr_irqs: 1096
>> [    3.332319] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 153, nr_irqs: 1096
>> [    8.010370] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13
>> [    9.545439] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12
>> [    9.545713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 154, nr_irqs: 1096
>> [    9.546034] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 155, nr_irqs: 1096
>> [    9.687796] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 156, nr_irqs: 1096
>> [    9.687979] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15
>> [    9.688057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 157, nr_irqs: 1096
>> [    9.921038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 158, nr_irqs: 1096
>> [    9.921210] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 158, idx: -1, ioapic: 1, pin: 5
>> [    9.921403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 159, nr_irqs: 1096
>> [    9.926373] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15
>> [    9.926747] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 160, nr_irqs: 1096
>> [    9.928201] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12
>> [    9.928488] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 161, nr_irqs: 1096
>> [   10.653915] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 162, nr_irqs: 1096
>> [   10.656257] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 163, nr_irqs: 1096
>>
>> You can find that the allocation of irq is not always based on the value of gsi. It follows the principle of requesting first, distributing first, like gsi 32 get 106 but gsi 28 get 112. And not only acpi_register_gsi_ioapic() will call into __irq_alloc_descs, but other functions will call, even earlier.
>> Above output is like baremetal. So, we can get conclusion irq != gsi. See below output on linux:
> 
> It does seem weird to me that it does identity map legacy IRQs (<16),
> but then for GSI >= 16 it starts assigning IRQs in the 100 range.
> 
> What uses the IRQ range [24, 105]?
They are allocated to the ipi, msi or event channel. They call __irq_alloc_descs before the pci devices. For example, see one ipi's callstack:
kernel_init
	kernel_init_freeable
		smp_prepare_cpus
			smp_ops.smp_prepare_cpus
				xen_hvm_smp_prepare_cpus
					xen_smp_intr_init
						bind_ipi_to_irqhandler
							bind_ipi_to_irq
								xen_allocate_irq_dynamic
									__irq_alloc_descs

> 
> Also IIRC on a PV dom0 GSIs are identity mapped to IRQs on Linux?  Or
> maybe that's just a side effect of GSIs being identity mapped into
> PIRQs by Xen?
PV is different, although, ipi also will come before pci devices, they don't occupy the irq(24~56). Because in PV dom0, it doesn't call setup_IO_APIC when start_kernel, so variable "ioapic_initialized" in function arch_dynirq_lower_bound is not initialized, and then gsi_top whose value is 56 is returned, the irq allocation begins from 56 number(but PVH and baremetal will initialize "ioapic_initialized", and then arch_dynirq_lower_bound will return ioapic_dynirq_base whose value is 24). What's more, when PV allocates irq for a pci device, it call acpi_register_gsi_xen->irq_alloc_desc_at->__irq_alloc_descs, function irq_alloc_desc_at send gsi to __irq_alloc_descs(PVH and baremetal send -1), so in function __irq_alloc_descs, variable "from" is equal gsi, and gsi is between 24~56, and 24~56's irq are not occupied before. Then it returns a irq that equal gsi.

> 
>> [    0.105053] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
>> [    0.105061] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 0, idx: 0, ioapic: 0, pin: 2
>> [    0.105069] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3
>> [    0.105078] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4
>> [    0.105086] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5
>> [    0.105094] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6
>> [    0.105103] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7
>> [    0.105111] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
>> [    0.105119] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
>> [    0.105127] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10
>> [    0.105136] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11
>> [    0.105144] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12
>> [    0.105152] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
>> [    0.105160] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14
>> [    0.105169] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15
>> [    0.398134] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9
>> [    1.169293] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8
>> [    1.169394] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13
>> [    1.323132] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7
>> [    1.345425] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1
>> [    1.375502] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096
>> [    1.375575] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 24, idx: -1, ioapic: 1, pin: 8
>> [    1.375661] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
>> [    1.375705] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13
>> [    1.442277] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13
>> [    1.442393] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
>> [    1.442450] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14
>> [    1.453893] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14
>> [    1.456127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096
>> [    1.734065] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096
>> [    1.734165] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096
>> [    1.734253] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096
>> [    1.734344] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096
>> [    1.734426] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096
>> [    1.734512] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096
>> [    1.734597] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096
>> [    1.734643] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096
>> [    1.734687] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096
>> [    1.734728] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096
>> [    1.735017] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096
>> [    1.735252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096
>> [    1.735467] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096
>> [    1.735799] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096
>> [    1.736024] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096
>> [    1.736364] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096
>> [    1.736406] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4
>> [    1.736434] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096
>> [    1.736701] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4
>> [    1.736724] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096
>> [    3.037123] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096
>> [    3.037313] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13
>> [    3.037515] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13
>> [    3.037738] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096
>> [    3.037959] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096
>> [    3.038073] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096
>> [    3.038154] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096
>> [    3.038179] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12
>> [    3.038277] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096
>> [    3.038399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096
>> [    3.038525] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096
>> [    3.038657] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096
>> [    3.038852] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096
>> [    3.052377] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096
>> [    3.052479] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 54, idx: -1, ioapic: 1, pin: 14
>> [    3.052730] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096
>> [    3.052840] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096
>> [    3.052918] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096
>> [    3.052987] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096
>> [    3.053069] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096
>> [    3.053139] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096
>> [    3.053201] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096
>> [    3.053260] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096
>> [    3.089128] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 63, nr_irqs: 1096
>> [    3.089310] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 63, idx: -1, ioapic: 1, pin: 24
>> [    3.089376] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096
>> [    3.103435] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096
>> [    3.114190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096
>> [    3.114346] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 66, nr_irqs: 1096
>> [    3.121215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 67, nr_irqs: 1096
>> [    3.121350] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 68, nr_irqs: 1096
>> [    3.121479] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 69, nr_irqs: 1096
>> [    3.121612] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 70, nr_irqs: 1096
>> [    3.121726] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 71, nr_irqs: 1096
>> [    3.121841] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 72, nr_irqs: 1096
>> [    3.121955] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 73, nr_irqs: 1096
>> [    3.122025] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 74, nr_irqs: 1096
>> [    3.122093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 75, nr_irqs: 1096
>> [    3.122148] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 76, nr_irqs: 1096
>> [    3.122203] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 77, nr_irqs: 1096
>> [    3.122265] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 78, nr_irqs: 1096
>> [    3.122322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 79, nr_irqs: 1096
>> [    3.122378] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 80, nr_irqs: 1096
>> [    3.122433] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 81, nr_irqs: 1096
>> [    7.838753] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13
>> [    9.619174] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12
>> [    9.619556] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096
>> [    9.622038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096
>> [    9.634900] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096
>> [    9.635316] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15
>> [    9.635405] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096
>> [   10.006686] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096
>> [   10.006823] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 86, idx: -1, ioapic: 1, pin: 5
>> [   10.007009] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096
>> [   10.008723] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15
>> [   10.009853] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096
>> [   10.010786] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12
>> [   10.010858] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096
>>
>> 2. Why I do the translations between irq and gsi?
>>
>> After answering question 1, we get irq != gsi. And I found, in QEMU, (pci_qdev_realize->xen_pt_realize->xen_host_pci_device_get->xen_host_pci_get_hex_value) will get the irq number, but later, pci_qdev_realize->xen_pt_realize->xc_physdev_map_pirq requires us to pass into gsi,
> 
> So that's quite a difference.  For some reason on a PV dom0
> xen_host_pci_get_hex_value will return the IRQ that's identity mapped
> to the GSI.
> 
> Is that because a PV dom0 will use acpi_register_gsi_xen() instead of
> acpi_register_gsi_ioapic()?
Not right, PV get irq from /sys/bus/pci/devices/xxxx:xx:xx.x/irq, see xen_pt_realize-> xen_host_pci_device_get-> xen_host_pci_get_dec_value-> xen_host_pci_get_value-> open, and it treats irq as gsi.

> 
>> it will call into Xen physdev_map_pirq-> allocate_and_map_gsi_pirq to allocate pirq for gsi. And then the error occurred.
>> Not only that, the callback function pci_add_dm_done-> xc_physdev_map_pirq also need gsi.
>>
>> So, I added the function xc_physdev_map_pirq() to translate irq to gsi, for QEMU.
>>
>> And I didn't find similar functions in existing linux codes, and I think only "QEMU passthrough for Xen" need this translation, so I added it into privcmd. If you guys know any other similar functions or other more suitable places, please feel free to tell me.
>>
>> 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()?
>>
>> Because if you want to map a gsi for domU, it must have a mapping in dom0. See QEMU code:
>> pci_add_dm_done
>> 	xc_physdev_map_pirq
>> 	xc_domain_irq_permission
>> 		XEN_DOMCTL_irq_permission
>> 			pirq_access_permitted
>> xc_physdev_map_pirq will get the pirq which mapped from gsi, and xc_domain_irq_permission will use pirq and call into Xen. If we don't do PHYSDEVOP_map_pirq for passthrough devices on PVH dom0, then pirq_access_permitted will get a NULL irq from dom0 and get failed.
> 
> I'm not sure of this specific case, but we shouldn't attempt to fit
> the same exact PCI pass through workflow that a PV dom0 uses into a
> PVH dom0.  IOW: it might make sense to diverge some paths in order to
> avoid importing PV specific concepts into PVH without a reason.
Yes, I agree with you. I also try another method to solve this problem. I think we can discuss this in the new series.

> 
>> So, I added PHYSDEVOP_map_pirq for PVH dom0. But I think it is only necessary for passthrough devices to do that, instead of all devices which call __acpi_register_gsi. In next version patch, I will restrain that only passthrough devices can do PHYSDEVOP_map_pirq.
>>
>> 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()?
>>
>> Like Roger's comments, the gsi of passthrough device doesn't be unmasked and registered(I added printings in vioapic_hwdom_map_gsi(), and I found that it never be called for dGPU with gsi 28 in my environment).
>> So, I called PHYSDEVOP_setup_gsi to register gsi.
>> But I agree with Roger and Jan's opinion, it is wrong to do PHYSDEVOP_setup_gsi for all devices.
>> So, in next version patch, I will also restrain that only passthrough devices can do PHYSDEVOP_setup_gsi.
> 
> Right, given how long it's been since the last series, I think we need
> a new series posted in order to see how this looks now.
Agree, I am looking forward to getting your comments in the new series.

> 
> Thanks, Roger.
diff mbox series

Patch

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index f4c4f17545..47cf2799bf 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1486,6 +1486,7 @@  static void pci_add_dm_done(libxl__egc *egc,
         goto out_no_irq;
     }
     if ((fscanf(f, "%u", &irq) == 1) && irq) {
+        irq = xc_physdev_gsi_from_irq(ctx->xch, irq);
         r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
         if (r < 0) {
             LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",