diff mbox series

[v3,3/8] acpi/gpex: Inform os to keep firmware resource map

Message ID 20201223090836.9075-4-cenjiahui@huawei.com (mailing list archive)
State New, archived
Headers show
Series acpi: Some fixes for pxb support for ARM virt machine | expand

Commit Message

Jiahui Cen Dec. 23, 2020, 9:08 a.m. UTC
There may be some differences in pci resource assignment between guest os
and firmware.

Eg. A Bridge with Bus [d2]
    -+-[0000:d2]---01.0-[d3]----01.0

    where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
          [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
                                          BAR4 (mem, 64-bit, pref) [size=64M]

    In EDK2, the Resource Map would be:
        PciBus: Resource Map for Bridge [D2|01|00]
        Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
           Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
           Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
        Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
    It would use 0x4100000 to calculate the root bus's PMem64 resource window.

    While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
    the PMem64 size, which would be 0x6000000. So kernel would try to
    allocate 0x6000000 from the PMem64 resource window, but since the window
    size is 0x4100000 as assigned by EDK2, the allocation would fail.

The diffences could result in resource assignment failure.

Using _DSM #5 method to inform guest os not to ignore the PCI configuration
that firmware has done at boot time could handle the differences.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
---
 hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Igor Mammedov Dec. 29, 2020, 1:41 p.m. UTC | #1
On Wed, 23 Dec 2020 17:08:31 +0800
Jiahui Cen <cenjiahui@huawei.com> wrote:

> There may be some differences in pci resource assignment between guest os
> and firmware.
> 
> Eg. A Bridge with Bus [d2]
>     -+-[0000:d2]---01.0-[d3]----01.0
> 
>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>                                           BAR4 (mem, 64-bit, pref) [size=64M]
> 
>     In EDK2, the Resource Map would be:
>         PciBus: Resource Map for Bridge [D2|01|00]
>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
> 
>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>     the PMem64 size, which would be 0x6000000. So kernel would try to
>     allocate 0x6000000 from the PMem64 resource window, but since the window
>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
> 
> The diffences could result in resource assignment failure.
> 
> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
> that firmware has done at boot time could handle the differences.

I'm not sure about this one, 
OS should able to reconfigure PCI resources according to what and where is plugged
(and it even more true is hotplug is taken into account)

> 
> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> ---
>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> index 11b3db8f71..c189306599 100644
> --- a/hw/pci-host/gpex-acpi.c
> +++ b/hw/pci-host/gpex-acpi.c
> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
> -    uint8_t byte_list[1] = {1};
> -    buf = aml_buffer(1, byte_list);
> +    uint8_t byte_list[] = {
> +                0x1 << 0 /* support for functions other than function 0 */ |
> +                0x1 << 5 /* support for function 5 */
> +                };
> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>      aml_append(ifctx1, aml_return(buf));
>      aml_append(ifctx, ifctx1);
> +
> +    /* PCI Firmware Specification 3.1
> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
> +     */
> +    /* Arg2: Function Index: 5 */
> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
> +    /* 0 - The operating system must not ignore the PCI configuration that
> +     *     firmware has done at boot time.
> +     */
> +    aml_append(ifctx1, aml_return(aml_int(0)));
> +    aml_append(ifctx, ifctx1);
>      aml_append(method, ifctx);
>  
>      byte_list[0] = 0;
Michael S. Tsirkin Dec. 30, 2020, 9:22 p.m. UTC | #2
On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
> On Wed, 23 Dec 2020 17:08:31 +0800
> Jiahui Cen <cenjiahui@huawei.com> wrote:
> 
> > There may be some differences in pci resource assignment between guest os
> > and firmware.
> > 
> > Eg. A Bridge with Bus [d2]
> >     -+-[0000:d2]---01.0-[d3]----01.0
> > 
> >     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
> >           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
> >                                           BAR4 (mem, 64-bit, pref) [size=64M]
> > 
> >     In EDK2, the Resource Map would be:
> >         PciBus: Resource Map for Bridge [D2|01|00]
> >         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
> >            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
> >            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
> >         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
> >     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
> > 
> >     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
> >     the PMem64 size, which would be 0x6000000. So kernel would try to
> >     allocate 0x6000000 from the PMem64 resource window, but since the window
> >     size is 0x4100000 as assigned by EDK2, the allocation would fail.
> > 
> > The diffences could result in resource assignment failure.
> > 
> > Using _DSM #5 method to inform guest os not to ignore the PCI configuration
> > that firmware has done at boot time could handle the differences.
> 
> I'm not sure about this one, 
> OS should able to reconfigure PCI resources according to what and where is plugged
> (and it even more true is hotplug is taken into account)

spec says this:

0: No (The operating system must not ignore the PCI configuration that firmware has done
at boot time. However, the operating system is free to configure the devices in this hierarchy
that have not been configured by the firmware. There may be a reduced level of hot plug
capability support in this hierarchy due to resource constraints. This situation is the same as
the legacy situation where this _DSM is not provided.)
1: Yes (The operating system may ignore the PCI configuration that the firmware has done
at boot time, and reconfigure/rebalance the resources in the hierarchy.)

and

IMPLEMENTATION NOTE
This _DSM function provides backwards compatibility on platforms that can run legacy operating
systems.
Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
The firmware cannot distinguish the operating system in time to change the boot configuration of
devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
operating system is installed on this system, it can access device resources above the 4 GiB so it does
not want the firmware to constrain the resource assignment below 4 GiB that the firmware
configures at boot time. It is not possible for the firmware to change this by the time it boots the
operating system. Ignoring the configurations done by firmware at boot time will allow the
operating system to push resource assignment using addresses above 4 GiB for an x64 operating
system while constrain it to addresses below 4 GiB for an x86 operating system.

so fundamentally, saying "1" here just means "you can ignore what
firmware configured if you like".


I have a different question though: our CRS etc is based on what
firmware configured. Is that ok? Or is ACPI expected to somehow
reconfigure itself when OS reconfigures devices?
Think it's ok but could not find documentation either way.


> > 
> > Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> > ---
> >  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> > index 11b3db8f71..c189306599 100644
> > --- a/hw/pci-host/gpex-acpi.c
> > +++ b/hw/pci-host/gpex-acpi.c
> > @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
> >      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
> >      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
> >      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
> > -    uint8_t byte_list[1] = {1};
> > -    buf = aml_buffer(1, byte_list);
> > +    uint8_t byte_list[] = {
> > +                0x1 << 0 /* support for functions other than function 0 */ |
> > +                0x1 << 5 /* support for function 5 */
> > +                };
> > +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
> >      aml_append(ifctx1, aml_return(buf));
> >      aml_append(ifctx, ifctx1);
> > +
> > +    /* PCI Firmware Specification 3.1
> > +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
> > +     */
> > +    /* Arg2: Function Index: 5 */
> > +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
> > +    /* 0 - The operating system must not ignore the PCI configuration that
> > +     *     firmware has done at boot time.
> > +     */
> > +    aml_append(ifctx1, aml_return(aml_int(0)));
> > +    aml_append(ifctx, ifctx1);
> >      aml_append(method, ifctx);
> >  
> >      byte_list[0] = 0;
Jiahui Cen Dec. 31, 2020, 3:30 a.m. UTC | #3
On 2020/12/29 21:41, Igor Mammedov wrote:
> On Wed, 23 Dec 2020 17:08:31 +0800
> Jiahui Cen <cenjiahui@huawei.com> wrote:
> 
>> There may be some differences in pci resource assignment between guest os
>> and firmware.
>>
>> Eg. A Bridge with Bus [d2]
>>     -+-[0000:d2]---01.0-[d3]----01.0
>>
>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
>>
>>     In EDK2, the Resource Map would be:
>>         PciBus: Resource Map for Bridge [D2|01|00]
>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
>>
>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>>     the PMem64 size, which would be 0x6000000. So kernel would try to
>>     allocate 0x6000000 from the PMem64 resource window, but since the window
>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>
>> The diffences could result in resource assignment failure.
>>
>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>> that firmware has done at boot time could handle the differences.
> 
> I'm not sure about this one, 
> OS should able to reconfigure PCI resources according to what and where is plugged
> (and it even more true is hotplug is taken into account)

I think the problem is that OS can not reconfigure the resource windows set in _CRS
by firmware, which means the total resource range where OS can assign from is limited.
So would it be better that OS prefers the resource assignment by firmware and
reconfigures those not properly assigned resources?

And the bios seems to have taken hotplug reserved resources into account.

Thanks,
Jiahui

>>
>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>> ---
>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>> index 11b3db8f71..c189306599 100644
>> --- a/hw/pci-host/gpex-acpi.c
>> +++ b/hw/pci-host/gpex-acpi.c
>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>> -    uint8_t byte_list[1] = {1};
>> -    buf = aml_buffer(1, byte_list);
>> +    uint8_t byte_list[] = {
>> +                0x1 << 0 /* support for functions other than function 0 */ |
>> +                0x1 << 5 /* support for function 5 */
>> +                };
>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>      aml_append(ifctx1, aml_return(buf));
>>      aml_append(ifctx, ifctx1);
>> +
>> +    /* PCI Firmware Specification 3.1
>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>> +     */
>> +    /* Arg2: Function Index: 5 */
>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>> +    /* 0 - The operating system must not ignore the PCI configuration that
>> +     *     firmware has done at boot time.
>> +     */
>> +    aml_append(ifctx1, aml_return(aml_int(0)));
>> +    aml_append(ifctx, ifctx1);
>>      aml_append(method, ifctx);
>>  
>>      byte_list[0] = 0;
> 
> .
>
Jiahui Cen Dec. 31, 2020, 8:22 a.m. UTC | #4
On 2020/12/31 5:22, Michael S. Tsirkin wrote:
> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
>> On Wed, 23 Dec 2020 17:08:31 +0800
>> Jiahui Cen <cenjiahui@huawei.com> wrote:
>>
>>> There may be some differences in pci resource assignment between guest os
>>> and firmware.
>>>
>>> Eg. A Bridge with Bus [d2]
>>>     -+-[0000:d2]---01.0-[d3]----01.0
>>>
>>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
>>>
>>>     In EDK2, the Resource Map would be:
>>>         PciBus: Resource Map for Bridge [D2|01|00]
>>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
>>>
>>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>>>     the PMem64 size, which would be 0x6000000. So kernel would try to
>>>     allocate 0x6000000 from the PMem64 resource window, but since the window
>>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>>
>>> The diffences could result in resource assignment failure.
>>>
>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>>> that firmware has done at boot time could handle the differences.
>>
>> I'm not sure about this one, 
>> OS should able to reconfigure PCI resources according to what and where is plugged
>> (and it even more true is hotplug is taken into account)
> 
> spec says this:
> 
> 0: No (The operating system must not ignore the PCI configuration that firmware has done
> at boot time. However, the operating system is free to configure the devices in this hierarchy
> that have not been configured by the firmware. There may be a reduced level of hot plug
> capability support in this hierarchy due to resource constraints. This situation is the same as
> the legacy situation where this _DSM is not provided.)
> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
> at boot time, and reconfigure/rebalance the resources in the hierarchy.)
> 
> and
> 
> IMPLEMENTATION NOTE
> This _DSM function provides backwards compatibility on platforms that can run legacy operating
> systems.
> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
> The firmware cannot distinguish the operating system in time to change the boot configuration of
> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
> operating system is installed on this system, it can access device resources above the 4 GiB so it does
> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
> configures at boot time. It is not possible for the firmware to change this by the time it boots the
> operating system. Ignoring the configurations done by firmware at boot time will allow the
> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
> system while constrain it to addresses below 4 GiB for an x86 operating system.
> 
> so fundamentally, saying "1" here just means "you can ignore what
> firmware configured if you like".
> 
> 
> I have a different question though: our CRS etc is based on what
> firmware configured. Is that ok? Or is ACPI expected to somehow
> reconfigure itself when OS reconfigures devices?
> Think it's ok but could not find documentation either way.

In my humble opinion, it is ok.

I'm not sure whether it is useful, but PCI Firmware Specification Revision 3.0 Chapter 3.5 said:

Firmware must configure all Host Bridges in the systems, even if they are not connected to a
console or boot device. Firmware must configure Host Bridges in order to allow operating systems
to use the devices below the Host Bridges. This is because the Host Bridges programming model is
not defined by the PCI Specifications. “Configured” in this context means that:
- Memory and I/O resources are assigned and configured.
‰- Includes both the resources consumed by the Host Bridge and the resources passed through to
the secondary bus.
‰- The bridge is enabled to receive and forward transactions.
‰- The bridge is operating in “safe” mode. Safe mode includes:
● Enabling resources such as: I/O Port, Memory addresses, VGA routing, bus number, etc.
● Enabling detection of parity and system errors
● Programming cacheline, latency timer, and other registers as required by the PCI
Specifications.


I'm not sure, but does "This is because the Host Bridges
programming model is not defined by the PCI Specifications"
mean that OS has no way to reconfigure Host Bridges (and
their CRS etc configuration).

Please point it out if I misunderstand the spec.

Thanks,
Jiahui

> 
>>>
>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>>> ---
>>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>>> index 11b3db8f71..c189306599 100644
>>> --- a/hw/pci-host/gpex-acpi.c
>>> +++ b/hw/pci-host/gpex-acpi.c
>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>>> -    uint8_t byte_list[1] = {1};
>>> -    buf = aml_buffer(1, byte_list);
>>> +    uint8_t byte_list[] = {
>>> +                0x1 << 0 /* support for functions other than function 0 */ |
>>> +                0x1 << 5 /* support for function 5 */
>>> +                };
>>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>>      aml_append(ifctx1, aml_return(buf));
>>>      aml_append(ifctx, ifctx1);
>>> +
>>> +    /* PCI Firmware Specification 3.1
>>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>>> +     */
>>> +    /* Arg2: Function Index: 5 */
>>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>>> +    /* 0 - The operating system must not ignore the PCI configuration that
>>> +     *     firmware has done at boot time.
>>> +     */
>>> +    aml_append(ifctx1, aml_return(aml_int(0)));
>>> +    aml_append(ifctx, ifctx1);
>>>      aml_append(method, ifctx);
>>>  
>>>      byte_list[0] = 0;
> 
> .
>
Igor Mammedov Jan. 5, 2021, 12:35 a.m. UTC | #5
On Wed, 30 Dec 2020 16:22:08 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
> > On Wed, 23 Dec 2020 17:08:31 +0800
> > Jiahui Cen <cenjiahui@huawei.com> wrote:
> >   
> > > There may be some differences in pci resource assignment between guest os
> > > and firmware.
> > > 
> > > Eg. A Bridge with Bus [d2]
> > >     -+-[0000:d2]---01.0-[d3]----01.0
> > > 
> > >     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
> > >           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
> > >                                           BAR4 (mem, 64-bit, pref) [size=64M]
> > > 
> > >     In EDK2, the Resource Map would be:
> > >         PciBus: Resource Map for Bridge [D2|01|00]
> > >         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
> > >            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
> > >            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
> > >         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
> > >     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
> > > 
> > >     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
> > >     the PMem64 size, which would be 0x6000000. So kernel would try to
> > >     allocate 0x6000000 from the PMem64 resource window, but since the window
> > >     size is 0x4100000 as assigned by EDK2, the allocation would fail.
> > > 
> > > The diffences could result in resource assignment failure.
> > > 
> > > Using _DSM #5 method to inform guest os not to ignore the PCI configuration
> > > that firmware has done at boot time could handle the differences.  
> > 
> > I'm not sure about this one, 
> > OS should able to reconfigure PCI resources according to what and where is plugged
> > (and it even more true is hotplug is taken into account)  
> 
> spec says this:
> 
> 0: No (The operating system must not ignore the PCI configuration that firmware has done
> at boot time. However, the operating system is free to configure the devices in this hierarchy
> that have not been configured by the firmware. There may be a reduced level of hot plug
> capability support in this hierarchy due to resource constraints. This situation is the same as
> the legacy situation where this _DSM is not provided.)
> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
> at boot time, and reconfigure/rebalance the resources in the hierarchy.)
I sort of convinced my self that's is just hotplug work might need to implement reconfiguration
in guest kernel and maybe QEMU

Though I have a question,

 1. does it work for PC machine with current kernel, if so why?
 2. what it would take to make it work for arm/virt?

> and
> 
> IMPLEMENTATION NOTE
> This _DSM function provides backwards compatibility on platforms that can run legacy operating
> systems.
> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
> The firmware cannot distinguish the operating system in time to change the boot configuration of
> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
> operating system is installed on this system, it can access device resources above the 4 GiB so it does
> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
> configures at boot time. It is not possible for the firmware to change this by the time it boots the
> operating system. Ignoring the configurations done by firmware at boot time will allow the
> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
> system while constrain it to addresses below 4 GiB for an x86 operating system.
> 
> so fundamentally, saying "1" here just means "you can ignore what
> firmware configured if you like".
> 
> 
> I have a different question though: our CRS etc is based on what
> firmware configured. Is that ok? Or is ACPI expected to somehow
> reconfigure itself when OS reconfigures devices?
> Think it's ok but could not find documentation either way.

guest consume DSDT only at boot time,
reconfiguration can done later by PCI subsystem without
ACPI (at least it used to be so).

However DSM is dynamic,
and maybe evaluated at runtime,
though I don't know if kernel would re-evaluate this feature bit after boot


> 
> 
> > > 
> > > Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> > > ---
> > >  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
> > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> > > index 11b3db8f71..c189306599 100644
> > > --- a/hw/pci-host/gpex-acpi.c
> > > +++ b/hw/pci-host/gpex-acpi.c
> > > @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
> > >      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
> > >      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
> > >      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
> > > -    uint8_t byte_list[1] = {1};
> > > -    buf = aml_buffer(1, byte_list);
> > > +    uint8_t byte_list[] = {
> > > +                0x1 << 0 /* support for functions other than function 0 */ |
> > > +                0x1 << 5 /* support for function 5 */
> > > +                };
> > > +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
> > >      aml_append(ifctx1, aml_return(buf));
> > >      aml_append(ifctx, ifctx1);
> > > +
> > > +    /* PCI Firmware Specification 3.1
> > > +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
> > > +     */
> > > +    /* Arg2: Function Index: 5 */
> > > +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
> > > +    /* 0 - The operating system must not ignore the PCI configuration that
> > > +     *     firmware has done at boot time.
> > > +     */
> > > +    aml_append(ifctx1, aml_return(aml_int(0)));
> > > +    aml_append(ifctx, ifctx1);
> > >      aml_append(method, ifctx);
> > >  
> > >      byte_list[0] = 0;  
> 
>
Jiahui Cen Jan. 5, 2021, 1:53 a.m. UTC | #6
On 2021/1/5 8:35, Igor Mammedov wrote:
> On Wed, 30 Dec 2020 16:22:08 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
>>> On Wed, 23 Dec 2020 17:08:31 +0800
>>> Jiahui Cen <cenjiahui@huawei.com> wrote:
>>>   
>>>> There may be some differences in pci resource assignment between guest os
>>>> and firmware.
>>>>
>>>> Eg. A Bridge with Bus [d2]
>>>>     -+-[0000:d2]---01.0-[d3]----01.0
>>>>
>>>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>>>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>>>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
>>>>
>>>>     In EDK2, the Resource Map would be:
>>>>         PciBus: Resource Map for Bridge [D2|01|00]
>>>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>>>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>>>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>>>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>>>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
>>>>
>>>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>>>>     the PMem64 size, which would be 0x6000000. So kernel would try to
>>>>     allocate 0x6000000 from the PMem64 resource window, but since the window
>>>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>>>
>>>> The diffences could result in resource assignment failure.
>>>>
>>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>>>> that firmware has done at boot time could handle the differences.  
>>>
>>> I'm not sure about this one, 
>>> OS should able to reconfigure PCI resources according to what and where is plugged
>>> (and it even more true is hotplug is taken into account)  
>>
>> spec says this:
>>
>> 0: No (The operating system must not ignore the PCI configuration that firmware has done
>> at boot time. However, the operating system is free to configure the devices in this hierarchy
>> that have not been configured by the firmware. There may be a reduced level of hot plug
>> capability support in this hierarchy due to resource constraints. This situation is the same as
>> the legacy situation where this _DSM is not provided.)
>> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
>> at boot time, and reconfigure/rebalance the resources in the hierarchy.)
> I sort of convinced my self that's is just hotplug work might need to implement reconfiguration
> in guest kernel and maybe QEMU
> 
> Though I have a question,
> 
>  1. does it work for PC machine with current kernel, if so why?
>  2. what it would take to make it work for arm/virt?
> 

1. For x86, it generally keeps the configuration by firmware,
so there is nothing wrong for PC machine.

2. We add DSM method in DSDT to inform guest to keep
firmware's configuration, just like x86.

>> and
>>
>> IMPLEMENTATION NOTE
>> This _DSM function provides backwards compatibility on platforms that can run legacy operating
>> systems.
>> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
>> The firmware cannot distinguish the operating system in time to change the boot configuration of
>> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
>> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
>> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
>> operating system is installed on this system, it can access device resources above the 4 GiB so it does
>> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
>> configures at boot time. It is not possible for the firmware to change this by the time it boots the
>> operating system. Ignoring the configurations done by firmware at boot time will allow the
>> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
>> system while constrain it to addresses below 4 GiB for an x86 operating system.
>>
>> so fundamentally, saying "1" here just means "you can ignore what
>> firmware configured if you like".
>>
>>
>> I have a different question though: our CRS etc is based on what
>> firmware configured. Is that ok? Or is ACPI expected to somehow
>> reconfigure itself when OS reconfigures devices?
>> Think it's ok but could not find documentation either way.
> 
> guest consume DSDT only at boot time,
> reconfiguration can done later by PCI subsystem without
> ACPI (at least it used to be so).
> 
> However DSM is dynamic,
> and maybe evaluated at runtime,
> though I don't know if kernel would re-evaluate this feature bit after boot
> 

Seems kernel evaluates DSM only at boot time.

Thanks,
Jiahui

> 
>>
>>
>>>>
>>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>>>> ---
>>>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>>>> index 11b3db8f71..c189306599 100644
>>>> --- a/hw/pci-host/gpex-acpi.c
>>>> +++ b/hw/pci-host/gpex-acpi.c
>>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>>>> -    uint8_t byte_list[1] = {1};
>>>> -    buf = aml_buffer(1, byte_list);
>>>> +    uint8_t byte_list[] = {
>>>> +                0x1 << 0 /* support for functions other than function 0 */ |
>>>> +                0x1 << 5 /* support for function 5 */
>>>> +                };
>>>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>>>      aml_append(ifctx1, aml_return(buf));
>>>>      aml_append(ifctx, ifctx1);
>>>> +
>>>> +    /* PCI Firmware Specification 3.1
>>>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>>>> +     */
>>>> +    /* Arg2: Function Index: 5 */
>>>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>>>> +    /* 0 - The operating system must not ignore the PCI configuration that
>>>> +     *     firmware has done at boot time.
>>>> +     */
>>>> +    aml_append(ifctx1, aml_return(aml_int(0)));
>>>> +    aml_append(ifctx, ifctx1);
>>>>      aml_append(method, ifctx);
>>>>  
>>>>      byte_list[0] = 0;  
>>
>>
> 
> .
>
Laszlo Ersek Jan. 5, 2021, 7:33 p.m. UTC | #7
On 01/05/21 01:35, Igor Mammedov wrote:
> On Wed, 30 Dec 2020 16:22:08 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
>>> On Wed, 23 Dec 2020 17:08:31 +0800
>>> Jiahui Cen <cenjiahui@huawei.com> wrote:
>>>   
>>>> There may be some differences in pci resource assignment between guest os
>>>> and firmware.
>>>>
>>>> Eg. A Bridge with Bus [d2]
>>>>     -+-[0000:d2]---01.0-[d3]----01.0
>>>>
>>>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>>>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>>>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
>>>>
>>>>     In EDK2, the Resource Map would be:
>>>>         PciBus: Resource Map for Bridge [D2|01|00]
>>>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>>>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>>>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>>>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>>>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
>>>>
>>>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>>>>     the PMem64 size, which would be 0x6000000. So kernel would try to
>>>>     allocate 0x6000000 from the PMem64 resource window, but since the window
>>>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>>>
>>>> The diffences could result in resource assignment failure.
>>>>
>>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>>>> that firmware has done at boot time could handle the differences.  
>>>
>>> I'm not sure about this one, 
>>> OS should able to reconfigure PCI resources according to what and where is plugged
>>> (and it even more true is hotplug is taken into account)  
>>
>> spec says this:
>>
>> 0: No (The operating system must not ignore the PCI configuration that firmware has done
>> at boot time. However, the operating system is free to configure the devices in this hierarchy
>> that have not been configured by the firmware. There may be a reduced level of hot plug
>> capability support in this hierarchy due to resource constraints. This situation is the same as
>> the legacy situation where this _DSM is not provided.)
>> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
>> at boot time, and reconfigure/rebalance the resources in the hierarchy.)
> I sort of convinced my self that's is just hotplug work might need to implement reconfiguration
> in guest kernel and maybe QEMU
> 
> Though I have a question,
> 
>  1. does it work for PC machine with current kernel, if so why?
>  2. what it would take to make it work for arm/virt?

The Linux/arm64 guest deals with PCI resources differently for
historical reasons. I was extremely confused by that as well, but Ard
explained here:
<https://www.redhat.com/archives/edk2-devel-archive/2020-December/msg01027.html>.

(Do not be alarmed by Ard's initial statement "That is not going to
work"; he later revised that here:
<https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg05033.html>.)

Thanks,
Laszlo

> 
>> and
>>
>> IMPLEMENTATION NOTE
>> This _DSM function provides backwards compatibility on platforms that can run legacy operating
>> systems.
>> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
>> The firmware cannot distinguish the operating system in time to change the boot configuration of
>> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
>> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
>> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
>> operating system is installed on this system, it can access device resources above the 4 GiB so it does
>> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
>> configures at boot time. It is not possible for the firmware to change this by the time it boots the
>> operating system. Ignoring the configurations done by firmware at boot time will allow the
>> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
>> system while constrain it to addresses below 4 GiB for an x86 operating system.
>>
>> so fundamentally, saying "1" here just means "you can ignore what
>> firmware configured if you like".
>>
>>
>> I have a different question though: our CRS etc is based on what
>> firmware configured. Is that ok? Or is ACPI expected to somehow
>> reconfigure itself when OS reconfigures devices?
>> Think it's ok but could not find documentation either way.
> 
> guest consume DSDT only at boot time,
> reconfiguration can done later by PCI subsystem without
> ACPI (at least it used to be so).
> 
> However DSM is dynamic,
> and maybe evaluated at runtime,
> though I don't know if kernel would re-evaluate this feature bit after boot
> 
> 
>>
>>
>>>>
>>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>>>> ---
>>>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>>>> index 11b3db8f71..c189306599 100644
>>>> --- a/hw/pci-host/gpex-acpi.c
>>>> +++ b/hw/pci-host/gpex-acpi.c
>>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>>>> -    uint8_t byte_list[1] = {1};
>>>> -    buf = aml_buffer(1, byte_list);
>>>> +    uint8_t byte_list[] = {
>>>> +                0x1 << 0 /* support for functions other than function 0 */ |
>>>> +                0x1 << 5 /* support for function 5 */
>>>> +                };
>>>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>>>      aml_append(ifctx1, aml_return(buf));
>>>>      aml_append(ifctx, ifctx1);
>>>> +
>>>> +    /* PCI Firmware Specification 3.1
>>>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>>>> +     */
>>>> +    /* Arg2: Function Index: 5 */
>>>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>>>> +    /* 0 - The operating system must not ignore the PCI configuration that
>>>> +     *     firmware has done at boot time.
>>>> +     */
>>>> +    aml_append(ifctx1, aml_return(aml_int(0)));
>>>> +    aml_append(ifctx, ifctx1);
>>>>      aml_append(method, ifctx);
>>>>  
>>>>      byte_list[0] = 0;  
>>
>>
> 
>
Igor Mammedov Jan. 6, 2021, 1:29 p.m. UTC | #8
On Tue, 5 Jan 2021 09:53:49 +0800
Jiahui Cen <cenjiahui@huawei.com> wrote:

> On 2021/1/5 8:35, Igor Mammedov wrote:
> > On Wed, 30 Dec 2020 16:22:08 -0500
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> >> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:  
> >>> On Wed, 23 Dec 2020 17:08:31 +0800
> >>> Jiahui Cen <cenjiahui@huawei.com> wrote:
> >>>     
> >>>> There may be some differences in pci resource assignment between guest os
> >>>> and firmware.
> >>>>
> >>>> Eg. A Bridge with Bus [d2]
> >>>>     -+-[0000:d2]---01.0-[d3]----01.0
> >>>>
> >>>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
> >>>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
> >>>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
> >>>>
> >>>>     In EDK2, the Resource Map would be:
> >>>>         PciBus: Resource Map for Bridge [D2|01|00]
> >>>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
> >>>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
> >>>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
> >>>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
> >>>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
> >>>>
> >>>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
> >>>>     the PMem64 size, which would be 0x6000000. So kernel would try to
> >>>>     allocate 0x6000000 from the PMem64 resource window, but since the window
> >>>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
> >>>>
> >>>> The diffences could result in resource assignment failure.
> >>>>
> >>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
> >>>> that firmware has done at boot time could handle the differences.    
> >>>
> >>> I'm not sure about this one, 
> >>> OS should able to reconfigure PCI resources according to what and where is plugged
> >>> (and it even more true is hotplug is taken into account)    
> >>
> >> spec says this:
> >>
> >> 0: No (The operating system must not ignore the PCI configuration that firmware has done
> >> at boot time. However, the operating system is free to configure the devices in this hierarchy
> >> that have not been configured by the firmware. There may be a reduced level of hot plug
> >> capability support in this hierarchy due to resource constraints. This situation is the same as
> >> the legacy situation where this _DSM is not provided.)
> >> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
> >> at boot time, and reconfigure/rebalance the resources in the hierarchy.)  
> > I sort of convinced my self that's is just hotplug work might need to implement reconfiguration
> > in guest kernel and maybe QEMU
> > 
> > Though I have a question,
> > 
> >  1. does it work for PC machine with current kernel, if so why?
> >  2. what it would take to make it work for arm/virt?
> >   
> 
> 1. For x86, it generally keeps the configuration by firmware,
> so there is nothing wrong for PC machine.
> 
> 2. We add DSM method in DSDT to inform guest to keep
> firmware's configuration, just like x86.
> 
> >> and
> >>
> >> IMPLEMENTATION NOTE
> >> This _DSM function provides backwards compatibility on platforms that can run legacy operating
> >> systems.
> >> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
> >> The firmware cannot distinguish the operating system in time to change the boot configuration of
> >> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
> >> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
> >> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
> >> operating system is installed on this system, it can access device resources above the 4 GiB so it does
> >> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
> >> configures at boot time. It is not possible for the firmware to change this by the time it boots the
> >> operating system. Ignoring the configurations done by firmware at boot time will allow the
> >> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
> >> system while constrain it to addresses below 4 GiB for an x86 operating system.
> >>
> >> so fundamentally, saying "1" here just means "you can ignore what
> >> firmware configured if you like".
> >>
> >>
> >> I have a different question though: our CRS etc is based on what
> >> firmware configured. Is that ok? Or is ACPI expected to somehow
> >> reconfigure itself when OS reconfigures devices?
> >> Think it's ok but could not find documentation either way.  
> > 
> > guest consume DSDT only at boot time,
> > reconfiguration can done later by PCI subsystem without
> > ACPI (at least it used to be so).
> > 
> > However DSM is dynamic,
> > and maybe evaluated at runtime,
> > though I don't know if kernel would re-evaluate this feature bit after boot
> >   
> 
> Seems kernel evaluates DSM only at boot time.

Ok, lets respin this series without 5/8
to avoid mixing unrelated changes in one series.

We can think about 5/8 some more and return to it later if it proves hard to merge.

> 
> Thanks,
> Jiahui
> 
> >   
> >>
> >>  
> >>>>
> >>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> >>>> ---
> >>>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
> >>>>  1 file changed, 16 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> >>>> index 11b3db8f71..c189306599 100644
> >>>> --- a/hw/pci-host/gpex-acpi.c
> >>>> +++ b/hw/pci-host/gpex-acpi.c
> >>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
> >>>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
> >>>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
> >>>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
> >>>> -    uint8_t byte_list[1] = {1};
> >>>> -    buf = aml_buffer(1, byte_list);
> >>>> +    uint8_t byte_list[] = {
> >>>> +                0x1 << 0 /* support for functions other than function 0 */ |
> >>>> +                0x1 << 5 /* support for function 5 */
> >>>> +                };
> >>>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
> >>>>      aml_append(ifctx1, aml_return(buf));
> >>>>      aml_append(ifctx, ifctx1);
> >>>> +
> >>>> +    /* PCI Firmware Specification 3.1
> >>>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
> >>>> +     */
> >>>> +    /* Arg2: Function Index: 5 */
> >>>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
> >>>> +    /* 0 - The operating system must not ignore the PCI configuration that
> >>>> +     *     firmware has done at boot time.
> >>>> +     */
> >>>> +    aml_append(ifctx1, aml_return(aml_int(0)));
> >>>> +    aml_append(ifctx, ifctx1);
> >>>>      aml_append(method, ifctx);
> >>>>  
> >>>>      byte_list[0] = 0;    
> >>
> >>  
> > 
> > .
> >   
>
Jiahui Cen Jan. 7, 2021, 5:54 a.m. UTC | #9
On 2021/1/6 21:29, Igor Mammedov wrote:
> On Tue, 5 Jan 2021 09:53:49 +0800
> Jiahui Cen <cenjiahui@huawei.com> wrote:
> 
>> On 2021/1/5 8:35, Igor Mammedov wrote:
>>> On Wed, 30 Dec 2020 16:22:08 -0500
>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>   
>>>> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:  
>>>>> On Wed, 23 Dec 2020 17:08:31 +0800
>>>>> Jiahui Cen <cenjiahui@huawei.com> wrote:
>>>>>     
>>>>>> There may be some differences in pci resource assignment between guest os
>>>>>> and firmware.
>>>>>>
>>>>>> Eg. A Bridge with Bus [d2]
>>>>>>     -+-[0000:d2]---01.0-[d3]----01.0
>>>>>>
>>>>>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>>>>>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>>>>>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
>>>>>>
>>>>>>     In EDK2, the Resource Map would be:
>>>>>>         PciBus: Resource Map for Bridge [D2|01|00]
>>>>>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>>>>>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>>>>>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>>>>>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>>>>>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
>>>>>>
>>>>>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>>>>>>     the PMem64 size, which would be 0x6000000. So kernel would try to
>>>>>>     allocate 0x6000000 from the PMem64 resource window, but since the window
>>>>>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>>>>>
>>>>>> The diffences could result in resource assignment failure.
>>>>>>
>>>>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>>>>>> that firmware has done at boot time could handle the differences.    
>>>>>
>>>>> I'm not sure about this one, 
>>>>> OS should able to reconfigure PCI resources according to what and where is plugged
>>>>> (and it even more true is hotplug is taken into account)    
>>>>
>>>> spec says this:
>>>>
>>>> 0: No (The operating system must not ignore the PCI configuration that firmware has done
>>>> at boot time. However, the operating system is free to configure the devices in this hierarchy
>>>> that have not been configured by the firmware. There may be a reduced level of hot plug
>>>> capability support in this hierarchy due to resource constraints. This situation is the same as
>>>> the legacy situation where this _DSM is not provided.)
>>>> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
>>>> at boot time, and reconfigure/rebalance the resources in the hierarchy.)  
>>> I sort of convinced my self that's is just hotplug work might need to implement reconfiguration
>>> in guest kernel and maybe QEMU
>>>
>>> Though I have a question,
>>>
>>>  1. does it work for PC machine with current kernel, if so why?
>>>  2. what it would take to make it work for arm/virt?
>>>   
>>
>> 1. For x86, it generally keeps the configuration by firmware,
>> so there is nothing wrong for PC machine.
>>
>> 2. We add DSM method in DSDT to inform guest to keep
>> firmware's configuration, just like x86.
>>
>>>> and
>>>>
>>>> IMPLEMENTATION NOTE
>>>> This _DSM function provides backwards compatibility on platforms that can run legacy operating
>>>> systems.
>>>> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
>>>> The firmware cannot distinguish the operating system in time to change the boot configuration of
>>>> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
>>>> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
>>>> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
>>>> operating system is installed on this system, it can access device resources above the 4 GiB so it does
>>>> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
>>>> configures at boot time. It is not possible for the firmware to change this by the time it boots the
>>>> operating system. Ignoring the configurations done by firmware at boot time will allow the
>>>> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
>>>> system while constrain it to addresses below 4 GiB for an x86 operating system.
>>>>
>>>> so fundamentally, saying "1" here just means "you can ignore what
>>>> firmware configured if you like".
>>>>
>>>>
>>>> I have a different question though: our CRS etc is based on what
>>>> firmware configured. Is that ok? Or is ACPI expected to somehow
>>>> reconfigure itself when OS reconfigures devices?
>>>> Think it's ok but could not find documentation either way.  
>>>
>>> guest consume DSDT only at boot time,
>>> reconfiguration can done later by PCI subsystem without
>>> ACPI (at least it used to be so).
>>>
>>> However DSM is dynamic,
>>> and maybe evaluated at runtime,
>>> though I don't know if kernel would re-evaluate this feature bit after boot
>>>   
>>
>> Seems kernel evaluates DSM only at boot time.
> 
> Ok, lets respin this series without 5/8
> to avoid mixing unrelated changes in one series.
> 
> We can think about 5/8 some more and return to it later if it proves hard to merge.

OK, I'll split patch [5/8] from this series and
send them separately.

Thanks for the discussion.

Thanks,
Jiahui

> 
>>
>> Thanks,
>> Jiahui
>>
>>>   
>>>>
>>>>  
>>>>>>
>>>>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>>>>>> ---
>>>>>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>>>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>>>>>> index 11b3db8f71..c189306599 100644
>>>>>> --- a/hw/pci-host/gpex-acpi.c
>>>>>> +++ b/hw/pci-host/gpex-acpi.c
>>>>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>>>>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>>>>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>>>>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>>>>>> -    uint8_t byte_list[1] = {1};
>>>>>> -    buf = aml_buffer(1, byte_list);
>>>>>> +    uint8_t byte_list[] = {
>>>>>> +                0x1 << 0 /* support for functions other than function 0 */ |
>>>>>> +                0x1 << 5 /* support for function 5 */
>>>>>> +                };
>>>>>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>>>>>      aml_append(ifctx1, aml_return(buf));
>>>>>>      aml_append(ifctx, ifctx1);
>>>>>> +
>>>>>> +    /* PCI Firmware Specification 3.1
>>>>>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>>>>>> +     */
>>>>>> +    /* Arg2: Function Index: 5 */
>>>>>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>>>>>> +    /* 0 - The operating system must not ignore the PCI configuration that
>>>>>> +     *     firmware has done at boot time.
>>>>>> +     */
>>>>>> +    aml_append(ifctx1, aml_return(aml_int(0)));
>>>>>> +    aml_append(ifctx, ifctx1);
>>>>>>      aml_append(method, ifctx);
>>>>>>  
>>>>>>      byte_list[0] = 0;    
>>>>
>>>>  
>>>
>>> .
>>>   
>>
> 
> .
>
diff mbox series

Patch

diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index 11b3db8f71..c189306599 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -112,10 +112,24 @@  static void acpi_dsdt_add_pci_osc(Aml *dev)
     UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
     ifctx = aml_if(aml_equal(aml_arg(0), UUID));
     ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
-    uint8_t byte_list[1] = {1};
-    buf = aml_buffer(1, byte_list);
+    uint8_t byte_list[] = {
+                0x1 << 0 /* support for functions other than function 0 */ |
+                0x1 << 5 /* support for function 5 */
+                };
+    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
     aml_append(ifctx1, aml_return(buf));
     aml_append(ifctx, ifctx1);
+
+    /* PCI Firmware Specification 3.1
+     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
+     */
+    /* Arg2: Function Index: 5 */
+    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
+    /* 0 - The operating system must not ignore the PCI configuration that
+     *     firmware has done at boot time.
+     */
+    aml_append(ifctx1, aml_return(aml_int(0)));
+    aml_append(ifctx, ifctx1);
     aml_append(method, ifctx);
 
     byte_list[0] = 0;