diff mbox series

[v3,2/2] ACPI: APEI: Filter the PCI MCFG address with an arch-agnostic method

Message ID YW5OmSBM4mO1lDHs@Dennis-MBP.local (mailing list archive)
State Changes Requested
Delegated to: Bjorn Helgaas
Headers show
Series PCI MCFG consolidation and APEI resource filterin | expand

Commit Message

Xuesong Chen Oct. 19, 2021, 4:50 a.m. UTC
The commit d91525eb8ee6 ("ACPI, EINJ: Enhance error injection tolerance
level") fixes the issue that the ACPI/APEI can not access the PCI MCFG
address on x86 platform, but this issue can also happen on other
architectures, for instance, we got below error message on arm64 platform:
...
APEI: Can not request [mem 0x50100000-0x50100003] for APEI EINJ Trigger registers
...

This patch will try to handle this case in a more common way instead of the
original 'arch' specific solution, which will be beneficial to all the
APEI-dependent platforms after that.

Signed-off-by: Xuesong Chen <xuesong.chen@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Rafael. J. Wysocki <rafael@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Tomasz Nowicki <tn@semihalf.com>
---
 arch/x86/pci/mmconfig-shared.c | 28 --------------------------
 drivers/acpi/apei/apei-base.c  | 45 ++++++++++++++++++++++++++++--------------
 2 files changed, 30 insertions(+), 43 deletions(-)

Comments

Bjorn Helgaas Oct. 19, 2021, 3:04 p.m. UTC | #1
On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
> The commit d91525eb8ee6 ("ACPI, EINJ: Enhance error injection tolerance
> level") fixes the issue that the ACPI/APEI can not access the PCI MCFG
> address on x86 platform, but this issue can also happen on other
> architectures, for instance, we got below error message on arm64 platform:
> ...
> APEI: Can not request [mem 0x50100000-0x50100003] for APEI EINJ Trigger registers
> ...
> 
> This patch will try to handle this case in a more common way instead of the
> original 'arch' specific solution, which will be beneficial to all the
> APEI-dependent platforms after that.
> 
> Signed-off-by: Xuesong Chen <xuesong.chen@linux.alibaba.com>
> Reported-by: kernel test robot <lkp@intel.com>

The purpose of this patch is not to fix a problem reported by the
kernel test robot, so remove this tag.

I know the robot found a problem with a previous version of this
patch, but we treat that the same as a code review comment.  We
normally don't explicitly credit reviewers unless it was something
major, and then it would go in the commit log, not a "Reported-by"
tag.

It makes sense to credit the kernel test robot for things found in
Linus' tree, but it's a little too aggressive about suggesting the tag
for problems with unmerged changes.

> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

This tag can only be added when Lorenzo explicitly supplies it
himself.  I do not see that on the mailing list, so please remove this
tag as well.  After Lorenzo supplies it, you can include it in future
postings as long as you don't make significant changes to the patch.

Bjorn
Bjorn Helgaas Oct. 19, 2021, 7:23 p.m. UTC | #2
[+cc Huang in case I'm misinterpreting EINJ resource requests]

On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
> The commit d91525eb8ee6 ("ACPI, EINJ: Enhance error injection tolerance
> level") fixes the issue that the ACPI/APEI can not access the PCI MCFG
> address on x86 platform, but this issue can also happen on other
> architectures, for instance, we got below error message on arm64 platform:
> ...
> APEI: Can not request [mem 0x50100000-0x50100003] for APEI EINJ Trigger registers

I'm guessing this address is something in the MCFG area?  I wish we
could also include the matching MCFG info For x86, we put something
like this in the dmesg log, but I guess pci_mcfg.c doesn't do this:

  PCI: MMCONFIG for domain 0000 [bus 00-7f] at [mem 0xf0000000-0xf7ffffff] (base 0xf0000000)

> This patch will try to handle this case in a more common way instead of the
> original 'arch' specific solution, which will be beneficial to all the
> APEI-dependent platforms after that.

This actually doesn't say anything about what the patch does or how it
works.  It says "handles this case in a more common way" but with no
details.

The EINJ table contains "injection instructions" that can read or
write "register regions" described by generic address structures (see
ACPI v6.3, sec 18.6.2 and 18.6.3), and __einj_error_trigger() requests
those register regions with request_mem_region() or request_region()
before executing the injections instructions.

IIUC, this patch basically says "if this region is part of the MCFG
area, we don't need to reserve it." That leads to the questions of why
we need to reserve *any* of the areas and why it's safe to simply skip
reserving regions that are part of the MCFG area.

> Signed-off-by: Xuesong Chen <xuesong.chen@linux.alibaba.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Rafael. J. Wysocki <rafael@kernel.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Tomasz Nowicki <tn@semihalf.com>
> ---
>  arch/x86/pci/mmconfig-shared.c | 28 --------------------------
>  drivers/acpi/apei/apei-base.c  | 45 ++++++++++++++++++++++++++++--------------
>  2 files changed, 30 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> index 0b961fe6..12f7d96 100644
> --- a/arch/x86/pci/mmconfig-shared.c
> +++ b/arch/x86/pci/mmconfig-shared.c
> @@ -605,32 +605,6 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
>  	return 0;
>  }
>  
> -#ifdef CONFIG_ACPI_APEI
> -extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
> -				     void *data), void *data);
> -
> -static int pci_mmcfg_for_each_region(int (*func)(__u64 start, __u64 size,
> -				     void *data), void *data)
> -{
> -	struct pci_mmcfg_region *cfg;
> -	int rc;
> -
> -	if (list_empty(&pci_mmcfg_list))
> -		return 0;
> -
> -	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
> -		rc = func(cfg->res.start, resource_size(&cfg->res), data);
> -		if (rc)
> -			return rc;
> -	}
> -
> -	return 0;
> -}
> -#define set_apei_filter() (arch_apei_filter_addr = pci_mmcfg_for_each_region)
> -#else
> -#define set_apei_filter()
> -#endif
> -
>  static void __init __pci_mmcfg_init(int early)
>  {
>  	pci_mmcfg_reject_broken(early);
> @@ -665,8 +639,6 @@ void __init pci_mmcfg_early_init(void)
>  		else
>  			acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
>  		__pci_mmcfg_init(1);
> -
> -		set_apei_filter();
>  	}
>  }
>  
> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
> index c7fdb12..daae75a 100644
> --- a/drivers/acpi/apei/apei-base.c
> +++ b/drivers/acpi/apei/apei-base.c
> @@ -21,6 +21,7 @@
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/init.h>
> +#include <linux/pci.h>
>  #include <linux/acpi.h>
>  #include <linux/slab.h>
>  #include <linux/io.h>
> @@ -448,13 +449,34 @@ static int apei_get_nvs_resources(struct apei_resources *resources)
>  	return acpi_nvs_for_each_region(apei_get_res_callback, resources);
>  }
>  
> -int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
> -				     void *data), void *data);
> -static int apei_get_arch_resources(struct apei_resources *resources)
> +#ifdef CONFIG_PCI
> +extern struct list_head pci_mmcfg_list;
> +static int apei_filter_mcfg_addr(struct apei_resources *res,
> +			struct apei_resources *mcfg_res)
> +{
> +	int rc = 0;
> +	struct pci_mmcfg_region *cfg;
> +
> +	if (list_empty(&pci_mmcfg_list))
> +		return 0;
> +
> +	apei_resources_init(mcfg_res);
> +	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
> +		rc = apei_res_add(&mcfg_res->iomem, cfg->res.start, resource_size(&cfg->res));
> +		if (rc)
> +			return rc;
> +	}
>  
> +	/* filter the mcfg resource from current APEI's */
> +	return apei_resources_sub(res, mcfg_res);
> +}
> +#else
> +static inline int apei_filter_mcfg_addr(struct apei_resources *res,
> +			struct apei_resources *mcfg_res)
>  {
> -	return arch_apei_filter_addr(apei_get_res_callback, resources);
> +	return 0;
>  }
> +#endif
>  
>  /*
>   * IO memory/port resource management mechanism is used to check
> @@ -486,15 +508,9 @@ int apei_resources_request(struct apei_resources *resources,
>  	if (rc)
>  		goto nvs_res_fini;
>  
> -	if (arch_apei_filter_addr) {
> -		apei_resources_init(&arch_res);
> -		rc = apei_get_arch_resources(&arch_res);
> -		if (rc)
> -			goto arch_res_fini;
> -		rc = apei_resources_sub(resources, &arch_res);
> -		if (rc)
> -			goto arch_res_fini;
> -	}
> +	rc = apei_filter_mcfg_addr(resources, &arch_res);
> +	if (rc)
> +		goto arch_res_fini;
>  
>  	rc = -EINVAL;
>  	list_for_each_entry(res, &resources->iomem, list) {
> @@ -544,8 +560,7 @@ int apei_resources_request(struct apei_resources *resources,
>  		release_mem_region(res->start, res->end - res->start);
>  	}
>  arch_res_fini:
> -	if (arch_apei_filter_addr)
> -		apei_resources_fini(&arch_res);
> +	apei_resources_fini(&arch_res);
>  nvs_res_fini:
>  	apei_resources_fini(&nvs_resources);
>  	return rc;
> -- 
> 1.8.3.1
>
Xuesong Chen Oct. 20, 2021, 2:35 a.m. UTC | #3
On 19/10/2021 23:04, Bjorn Helgaas wrote:
> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
>> The commit d91525eb8ee6 ("ACPI, EINJ: Enhance error injection tolerance
>> level") fixes the issue that the ACPI/APEI can not access the PCI MCFG
>> address on x86 platform, but this issue can also happen on other
>> architectures, for instance, we got below error message on arm64 platform:
>> ...
>> APEI: Can not request [mem 0x50100000-0x50100003] for APEI EINJ Trigger registers
>> ...
>>
>> This patch will try to handle this case in a more common way instead of the
>> original 'arch' specific solution, which will be beneficial to all the
>> APEI-dependent platforms after that.
>>
>> Signed-off-by: Xuesong Chen <xuesong.chen@linux.alibaba.com>
>> Reported-by: kernel test robot <lkp@intel.com>
> 
> The purpose of this patch is not to fix a problem reported by the
> kernel test robot, so remove this tag.

Yes, will do
> 
> I know the robot found a problem with a previous version of this
> patch, but we treat that the same as a code review comment.  We
> normally don't explicitly credit reviewers unless it was something
> major, and then it would go in the commit log, not a "Reported-by"
> tag.
> 
> It makes sense to credit the kernel test robot for things found in
> Linus' tree, but it's a little too aggressive about suggesting the tag
> for problems with unmerged changes.
> 
>> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> 
> This tag can only be added when Lorenzo explicitly supplies it
> himself.  I do not see that on the mailing list, so please remove this
> tag as well.  After Lorenzo supplies it, you can include it in future
> postings as long as you don't make significant changes to the patch.

En, Lorenzo does have comments on the patch#2 and I also update that patch
according to his feedback, so why the tag is here. OK, I'll add this tag
if Lorenzo can supply it explicitly before I send the next version.
> 
> Bjorn
>
Xuesong Chen Oct. 20, 2021, 3:16 a.m. UTC | #4
On 20/10/2021 03:23, Bjorn Helgaas wrote:
> [+cc Huang in case I'm misinterpreting EINJ resource requests]
> 
> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
>> The commit d91525eb8ee6 ("ACPI, EINJ: Enhance error injection tolerance
>> level") fixes the issue that the ACPI/APEI can not access the PCI MCFG
>> address on x86 platform, but this issue can also happen on other
>> architectures, for instance, we got below error message on arm64 platform:
>> ...
>> APEI: Can not request [mem 0x50100000-0x50100003] for APEI EINJ Trigger registers
> 
> I'm guessing this address is something in the MCFG area?  

Right, correct! The address 0x50100000 is within the MCFG area.

> I wish we
> could also include the matching MCFG info For x86, we put something
> like this in the dmesg log, but I guess pci_mcfg.c doesn't do this:
> 
>   PCI: MMCONFIG for domain 0000 [bus 00-7f] at [mem 0xf0000000-0xf7ffffff] (base 0xf0000000)
> 
I can add the similar dmesg log in pci_mcfg.c to make it's consistent between different arches

>> This patch will try to handle this case in a more common way instead of the
>> original 'arch' specific solution, which will be beneficial to all the
>> APEI-dependent platforms after that.
> 
> This actually doesn't say anything about what the patch does or how it
> works.  It says "handles this case in a more common way" but with no
> details.

Good suggestion, I'll give more details about that...
> The EINJ table contains "injection instructions" that can read or
> write "register regions" described by generic address structures (see
> ACPI v6.3, sec 18.6.2 and 18.6.3), and __einj_error_trigger() requests
> those register regions with request_mem_region() or request_region()
> before executing the injections instructions.
> 
> IIUC, this patch basically says "if this region is part of the MCFG
> area, we don't need to reserve it." That leads to the questions of why
> we need to reserve *any* of the areas

AFAIK, the MCFG area is reserved since the ECAM module will provide a
generic Kernel Programming Interfaces(KPI), e.g, pci_generic_config_read(...),
so all the drivers are allowed to access the pci config space only by those
KPIs in a consistent and safe way, direct raw access will break the rule.
Correct me if I am missing sth.

> and why it's safe to simply skip
> reserving regions that are part of the MCFG area.

Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error injection tolerance
level") before to address this issue, the entire commit log as below:

    Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
    specific errors. EINJ will report errors as below when hitting such
    cases:
    
    APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers
    
    It is because on x86 platform ACPI based PCI MMCFG logic has
    reserved all MMCFG spaces so that EINJ can't reserve it again.
    We already trust the ACPI/APEI code when using the EINJ interface
    so it is not a big leap to also trust it to access the right
    MMCFG addresses. Skip address checking to allow the access.
    
    Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
    Signed-off-by: Tony Luck <tony.luck@intel.com>

Except that the above explanation, IMO the EINJ is only a RAS debug framework, 
in this code path, sometimes we need to acesss the address within the MCFG space
directly to trigger kind of HW error, which behavior does not like the normal
device driver's, in this case some possible unsafe operations(bypass the ecam ops)
can be mitigated because the touched device will generate some HW errors and the
RAS handling part will preempt its corresponding drivers to fix/log the HW error, 
that's my understanding about that.

> 
>> Signed-off-by: Xuesong Chen <xuesong.chen@linux.alibaba.com>
>> Reported-by: kernel test robot <lkp@intel.com>
>> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: James Morse <james.morse@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Rafael. J. Wysocki <rafael@kernel.org>
>> Cc: Tony Luck <tony.luck@intel.com>
>> Cc: Tomasz Nowicki <tn@semihalf.com>
>> ---
>>  arch/x86/pci/mmconfig-shared.c | 28 --------------------------
>>  drivers/acpi/apei/apei-base.c  | 45 ++++++++++++++++++++++++++++--------------
>>  2 files changed, 30 insertions(+), 43 deletions(-)
>>
>> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
>> index 0b961fe6..12f7d96 100644
>> --- a/arch/x86/pci/mmconfig-shared.c
>> +++ b/arch/x86/pci/mmconfig-shared.c
>> @@ -605,32 +605,6 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
>>  	return 0;
>>  }
>>  
>> -#ifdef CONFIG_ACPI_APEI
>> -extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
>> -				     void *data), void *data);
>> -
>> -static int pci_mmcfg_for_each_region(int (*func)(__u64 start, __u64 size,
>> -				     void *data), void *data)
>> -{
>> -	struct pci_mmcfg_region *cfg;
>> -	int rc;
>> -
>> -	if (list_empty(&pci_mmcfg_list))
>> -		return 0;
>> -
>> -	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
>> -		rc = func(cfg->res.start, resource_size(&cfg->res), data);
>> -		if (rc)
>> -			return rc;
>> -	}
>> -
>> -	return 0;
>> -}
>> -#define set_apei_filter() (arch_apei_filter_addr = pci_mmcfg_for_each_region)
>> -#else
>> -#define set_apei_filter()
>> -#endif
>> -
>>  static void __init __pci_mmcfg_init(int early)
>>  {
>>  	pci_mmcfg_reject_broken(early);
>> @@ -665,8 +639,6 @@ void __init pci_mmcfg_early_init(void)
>>  		else
>>  			acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
>>  		__pci_mmcfg_init(1);
>> -
>> -		set_apei_filter();
>>  	}
>>  }
>>  
>> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
>> index c7fdb12..daae75a 100644
>> --- a/drivers/acpi/apei/apei-base.c
>> +++ b/drivers/acpi/apei/apei-base.c
>> @@ -21,6 +21,7 @@
>>  #include <linux/kernel.h>
>>  #include <linux/module.h>
>>  #include <linux/init.h>
>> +#include <linux/pci.h>
>>  #include <linux/acpi.h>
>>  #include <linux/slab.h>
>>  #include <linux/io.h>
>> @@ -448,13 +449,34 @@ static int apei_get_nvs_resources(struct apei_resources *resources)
>>  	return acpi_nvs_for_each_region(apei_get_res_callback, resources);
>>  }
>>  
>> -int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
>> -				     void *data), void *data);
>> -static int apei_get_arch_resources(struct apei_resources *resources)
>> +#ifdef CONFIG_PCI
>> +extern struct list_head pci_mmcfg_list;
>> +static int apei_filter_mcfg_addr(struct apei_resources *res,
>> +			struct apei_resources *mcfg_res)
>> +{
>> +	int rc = 0;
>> +	struct pci_mmcfg_region *cfg;
>> +
>> +	if (list_empty(&pci_mmcfg_list))
>> +		return 0;
>> +
>> +	apei_resources_init(mcfg_res);
>> +	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
>> +		rc = apei_res_add(&mcfg_res->iomem, cfg->res.start, resource_size(&cfg->res));
>> +		if (rc)
>> +			return rc;
>> +	}
>>  
>> +	/* filter the mcfg resource from current APEI's */
>> +	return apei_resources_sub(res, mcfg_res);
>> +}
>> +#else
>> +static inline int apei_filter_mcfg_addr(struct apei_resources *res,
>> +			struct apei_resources *mcfg_res)
>>  {
>> -	return arch_apei_filter_addr(apei_get_res_callback, resources);
>> +	return 0;
>>  }
>> +#endif
>>  
>>  /*
>>   * IO memory/port resource management mechanism is used to check
>> @@ -486,15 +508,9 @@ int apei_resources_request(struct apei_resources *resources,
>>  	if (rc)
>>  		goto nvs_res_fini;
>>  
>> -	if (arch_apei_filter_addr) {
>> -		apei_resources_init(&arch_res);
>> -		rc = apei_get_arch_resources(&arch_res);
>> -		if (rc)
>> -			goto arch_res_fini;
>> -		rc = apei_resources_sub(resources, &arch_res);
>> -		if (rc)
>> -			goto arch_res_fini;
>> -	}
>> +	rc = apei_filter_mcfg_addr(resources, &arch_res);
>> +	if (rc)
>> +		goto arch_res_fini;
>>  
>>  	rc = -EINVAL;
>>  	list_for_each_entry(res, &resources->iomem, list) {
>> @@ -544,8 +560,7 @@ int apei_resources_request(struct apei_resources *resources,
>>  		release_mem_region(res->start, res->end - res->start);
>>  	}
>>  arch_res_fini:
>> -	if (arch_apei_filter_addr)
>> -		apei_resources_fini(&arch_res);
>> +	apei_resources_fini(&arch_res);
>>  nvs_res_fini:
>>  	apei_resources_fini(&nvs_resources);
>>  	return rc;
>> -- 
>> 1.8.3.1
>>
Bjorn Helgaas Oct. 20, 2021, 6:50 p.m. UTC | #5
[+cc Gong, author of d91525eb8ee6]

On Wed, Oct 20, 2021 at 11:16:38AM +0800, Xuesong Chen wrote:
> On 20/10/2021 03:23, Bjorn Helgaas wrote:
> > On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:

> > I wish we could also include the matching MCFG info For x86, we
> > put something like this in the dmesg log, but I guess pci_mcfg.c
> > doesn't do this:
> > 
> >   PCI: MMCONFIG for domain 0000 [bus 00-7f] at [mem 0xf0000000-0xf7ffffff] (base 0xf0000000)
>
> I can add the similar dmesg log in pci_mcfg.c to make it's
> consistent between different arches

I think that might be nice.  It should probably be a separate patch
since it's not really related to the others.

> >> This patch will try to handle this case in a more common way instead of the
> >> original 'arch' specific solution, which will be beneficial to all the
> >> APEI-dependent platforms after that.
> > 
> > This actually doesn't say anything about what the patch does or how it
> > works.  It says "handles this case in a more common way" but with no
> > details.
> 
> Good suggestion, I'll give more details about that...
> > The EINJ table contains "injection instructions" that can read or
> > write "register regions" described by generic address structures (see
> > ACPI v6.3, sec 18.6.2 and 18.6.3), and __einj_error_trigger() requests
> > those register regions with request_mem_region() or request_region()
> > before executing the injections instructions.
> > 
> > IIUC, this patch basically says "if this region is part of the MCFG
> > area, we don't need to reserve it." That leads to the questions of why
> > we need to reserve *any* of the areas
> 
> AFAIK, the MCFG area is reserved since the ECAM module will provide
> a generic Kernel Programming Interfaces(KPI), e.g,
> pci_generic_config_read(...), so all the drivers are allowed to
> access the pci config space only by those KPIs in a consistent and
> safe way, direct raw access will break the rule.  Correct me if I am
> missing sth.
> 
> > and why it's safe to simply skip reserving regions that are part
> > of the MCFG area.
> 
> Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error
> injection tolerance level") before to address this issue, the entire
> commit log as below:
> 
>     Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
>     specific errors. EINJ will report errors as below when hitting such
>     cases:
>     
>     APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers
>     
>     It is because on x86 platform ACPI based PCI MMCFG logic has
>     reserved all MMCFG spaces so that EINJ can't reserve it again.
>     We already trust the ACPI/APEI code when using the EINJ interface
>     so it is not a big leap to also trust it to access the right
>     MMCFG addresses. Skip address checking to allow the access.

I'm not really convinced by that justification because I don't think
the issue here is *trust*.  If all we care about is trust, and we
trust the ACPI/APEI code, why do we need to reserve anything at all
when executing EINJ actions?

I think the resource reservation issue is about coordinating multiple
users of the address space.  A driver reserves the MMIO address space
of a device it controls so no other driver can reserve it at the same
time and cause conflicts.

I'm not really convinced by this mutual exclusion argument either,
because I haven't yet seen a situation where we say "EINJ needs a
resource that's already in use by somebody else, so we can't use
EINJ."  When conflicts arise, the response is always "we'll just
stop reserving this conflicting resource but use it anyway."

I think the only real value in apei_resources_request() is a little
bit of documentation in /proc/iomem.  For ERST and EINJ, even that
only lasts for the tiny period when we're actually executing an
action.

So convince me there's a reason why we shouldn't just remove
apei_resources_request() completely :)

> Except that the above explanation, IMO the EINJ is only a RAS debug
> framework, in this code path, sometimes we need to acesss the
> address within the MCFG space directly to trigger kind of HW error,
> which behavior does not like the normal device driver's, in this
> case some possible unsafe operations (bypass the ecam ops) can be
> mitigated because the touched device will generate some HW errors
> and the RAS handling part will preempt its corresponding drivers to
> fix/log the HW error, that's my understanding about that.

> >> Signed-off-by: Xuesong Chen <xuesong.chen@linux.alibaba.com>
> >> Reported-by: kernel test robot <lkp@intel.com>
> >> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> >> Cc: Catalin Marinas <catalin.marinas@arm.com>
> >> Cc: James Morse <james.morse@arm.com>
> >> Cc: Will Deacon <will@kernel.org>
> >> Cc: Rafael. J. Wysocki <rafael@kernel.org>
> >> Cc: Tony Luck <tony.luck@intel.com>
> >> Cc: Tomasz Nowicki <tn@semihalf.com>
> >> ---
> >>  arch/x86/pci/mmconfig-shared.c | 28 --------------------------
> >>  drivers/acpi/apei/apei-base.c  | 45 ++++++++++++++++++++++++++++--------------
> >>  2 files changed, 30 insertions(+), 43 deletions(-)
> >>
> >> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> >> index 0b961fe6..12f7d96 100644
> >> --- a/arch/x86/pci/mmconfig-shared.c
> >> +++ b/arch/x86/pci/mmconfig-shared.c
> >> @@ -605,32 +605,6 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
> >>  	return 0;
> >>  }
> >>  
> >> -#ifdef CONFIG_ACPI_APEI
> >> -extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
> >> -				     void *data), void *data);
> >> -
> >> -static int pci_mmcfg_for_each_region(int (*func)(__u64 start, __u64 size,
> >> -				     void *data), void *data)
> >> -{
> >> -	struct pci_mmcfg_region *cfg;
> >> -	int rc;
> >> -
> >> -	if (list_empty(&pci_mmcfg_list))
> >> -		return 0;
> >> -
> >> -	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
> >> -		rc = func(cfg->res.start, resource_size(&cfg->res), data);
> >> -		if (rc)
> >> -			return rc;
> >> -	}
> >> -
> >> -	return 0;
> >> -}
> >> -#define set_apei_filter() (arch_apei_filter_addr = pci_mmcfg_for_each_region)
> >> -#else
> >> -#define set_apei_filter()
> >> -#endif
> >> -
> >>  static void __init __pci_mmcfg_init(int early)
> >>  {
> >>  	pci_mmcfg_reject_broken(early);
> >> @@ -665,8 +639,6 @@ void __init pci_mmcfg_early_init(void)
> >>  		else
> >>  			acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
> >>  		__pci_mmcfg_init(1);
> >> -
> >> -		set_apei_filter();
> >>  	}
> >>  }
> >>  
> >> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
> >> index c7fdb12..daae75a 100644
> >> --- a/drivers/acpi/apei/apei-base.c
> >> +++ b/drivers/acpi/apei/apei-base.c
> >> @@ -21,6 +21,7 @@
> >>  #include <linux/kernel.h>
> >>  #include <linux/module.h>
> >>  #include <linux/init.h>
> >> +#include <linux/pci.h>
> >>  #include <linux/acpi.h>
> >>  #include <linux/slab.h>
> >>  #include <linux/io.h>
> >> @@ -448,13 +449,34 @@ static int apei_get_nvs_resources(struct apei_resources *resources)
> >>  	return acpi_nvs_for_each_region(apei_get_res_callback, resources);
> >>  }
> >>  
> >> -int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
> >> -				     void *data), void *data);
> >> -static int apei_get_arch_resources(struct apei_resources *resources)
> >> +#ifdef CONFIG_PCI
> >> +extern struct list_head pci_mmcfg_list;
> >> +static int apei_filter_mcfg_addr(struct apei_resources *res,
> >> +			struct apei_resources *mcfg_res)
> >> +{
> >> +	int rc = 0;
> >> +	struct pci_mmcfg_region *cfg;
> >> +
> >> +	if (list_empty(&pci_mmcfg_list))
> >> +		return 0;
> >> +
> >> +	apei_resources_init(mcfg_res);
> >> +	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
> >> +		rc = apei_res_add(&mcfg_res->iomem, cfg->res.start, resource_size(&cfg->res));
> >> +		if (rc)
> >> +			return rc;
> >> +	}
> >>  
> >> +	/* filter the mcfg resource from current APEI's */
> >> +	return apei_resources_sub(res, mcfg_res);
> >> +}
> >> +#else
> >> +static inline int apei_filter_mcfg_addr(struct apei_resources *res,
> >> +			struct apei_resources *mcfg_res)
> >>  {
> >> -	return arch_apei_filter_addr(apei_get_res_callback, resources);
> >> +	return 0;
> >>  }
> >> +#endif
> >>  
> >>  /*
> >>   * IO memory/port resource management mechanism is used to check
> >> @@ -486,15 +508,9 @@ int apei_resources_request(struct apei_resources *resources,
> >>  	if (rc)
> >>  		goto nvs_res_fini;
> >>  
> >> -	if (arch_apei_filter_addr) {
> >> -		apei_resources_init(&arch_res);
> >> -		rc = apei_get_arch_resources(&arch_res);
> >> -		if (rc)
> >> -			goto arch_res_fini;
> >> -		rc = apei_resources_sub(resources, &arch_res);
> >> -		if (rc)
> >> -			goto arch_res_fini;
> >> -	}
> >> +	rc = apei_filter_mcfg_addr(resources, &arch_res);
> >> +	if (rc)
> >> +		goto arch_res_fini;
> >>  
> >>  	rc = -EINVAL;
> >>  	list_for_each_entry(res, &resources->iomem, list) {
> >> @@ -544,8 +560,7 @@ int apei_resources_request(struct apei_resources *resources,
> >>  		release_mem_region(res->start, res->end - res->start);
> >>  	}
> >>  arch_res_fini:
> >> -	if (arch_apei_filter_addr)
> >> -		apei_resources_fini(&arch_res);
> >> +	apei_resources_fini(&arch_res);
> >>  nvs_res_fini:
> >>  	apei_resources_fini(&nvs_resources);
> >>  	return rc;
> >> -- 
> >> 1.8.3.1
> >>
Xuesong Chen Oct. 21, 2021, 3:46 p.m. UTC | #6
On 21/10/2021 02:50, Bjorn Helgaas wrote:
> [+cc Gong, author of d91525eb8ee6]
> 
> On Wed, Oct 20, 2021 at 11:16:38AM +0800, Xuesong Chen wrote:
>> On 20/10/2021 03:23, Bjorn Helgaas wrote:
>>> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
> 
>>> I wish we could also include the matching MCFG info For x86, we
>>> put something like this in the dmesg log, but I guess pci_mcfg.c
>>> doesn't do this:
>>>
>>>   PCI: MMCONFIG for domain 0000 [bus 00-7f] at [mem 0xf0000000-0xf7ffffff] (base 0xf0000000)
>>
>> I can add the similar dmesg log in pci_mcfg.c to make it's
>> consistent between different arches
> 
> I think that might be nice.  It should probably be a separate patch
> since it's not really related to the others.

Yes, will do.

> 
>>>> This patch will try to handle this case in a more common way instead of the
>>>> original 'arch' specific solution, which will be beneficial to all the
>>>> APEI-dependent platforms after that.
>>>
>>> This actually doesn't say anything about what the patch does or how it
>>> works.  It says "handles this case in a more common way" but with no
>>> details.
>>
>> Good suggestion, I'll give more details about that...
>>> The EINJ table contains "injection instructions" that can read or
>>> write "register regions" described by generic address structures (see
>>> ACPI v6.3, sec 18.6.2 and 18.6.3), and __einj_error_trigger() requests
>>> those register regions with request_mem_region() or request_region()
>>> before executing the injections instructions.
>>>
>>> IIUC, this patch basically says "if this region is part of the MCFG
>>> area, we don't need to reserve it." That leads to the questions of why
>>> we need to reserve *any* of the areas
>>
>> AFAIK, the MCFG area is reserved since the ECAM module will provide
>> a generic Kernel Programming Interfaces(KPI), e.g,
>> pci_generic_config_read(...), so all the drivers are allowed to
>> access the pci config space only by those KPIs in a consistent and
>> safe way, direct raw access will break the rule.  Correct me if I am
>> missing sth.
>>
>>> and why it's safe to simply skip reserving regions that are part
>>> of the MCFG area.
>>
>> Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error
>> injection tolerance level") before to address this issue, the entire
>> commit log as below:
>>
>>     Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
>>     specific errors. EINJ will report errors as below when hitting such
>>     cases:
>>     
>>     APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers
>>     
>>     It is because on x86 platform ACPI based PCI MMCFG logic has
>>     reserved all MMCFG spaces so that EINJ can't reserve it again.
>>     We already trust the ACPI/APEI code when using the EINJ interface
>>     so it is not a big leap to also trust it to access the right
>>     MMCFG addresses. Skip address checking to allow the access.
> 
> I'm not really convinced by that justification because I don't think
> the issue here is *trust*.  If all we care about is trust, and we
> trust the ACPI/APEI code, why do we need to reserve anything at all
> when executing EINJ actions?
> 
> I think the resource reservation issue is about coordinating multiple
> users of the address space.  A driver reserves the MMIO address space
> of a device it controls so no other driver can reserve it at the same
> time and cause conflicts.
> 
> I'm not really convinced by this mutual exclusion argument either,
> because I haven't yet seen a situation where we say "EINJ needs a
> resource that's already in use by somebody else, so we can't use
> EINJ."  When conflicts arise, the response is always "we'll just
> stop reserving this conflicting resource but use it anyway."
> 
> I think the only real value in apei_resources_request() is a little
> bit of documentation in /proc/iomem.  For ERST and EINJ, even that
> only lasts for the tiny period when we're actually executing an
> action.
> 
> So convince me there's a reason why we shouldn't just remove
> apei_resources_request() completely :)
> 

I have to confess that currently I have no strong evidence/reason to convince
you that it's absolute safe to remove apei_resources_request(),  probably in
some conditions it *does* require to follow the mutual exclusion usage model.
The ECAM/MCFG maybe a special case not like other normal device driver, since
all its MCFG space has been reserved during the initialization. Anyway, it's
another topic and good point well worth discussing in the future.

From the patch set itself, I don't think it's a nice idea to make a dramatic
change regarding the apei_resources_request() part, I suggest to keep the original
rationale untouched and based on that to fix the real issue at hand in a more
generic way.

Any thought? If no objection, I plan to finalize the next version patch according
to the previous feedbacks and send it out for reviewing soon...

Thanks,
Xuesong

>> Except that the above explanation, IMO the EINJ is only a RAS debug
>> framework, in this code path, sometimes we need to acesss the
>> address within the MCFG space directly to trigger kind of HW error,
>> which behavior does not like the normal device driver's, in this
>> case some possible unsafe operations (bypass the ecam ops) can be
>> mitigated because the touched device will generate some HW errors
>> and the RAS handling part will preempt its corresponding drivers to
>> fix/log the HW error, that's my understanding about that.
> 
>>>> Signed-off-by: Xuesong Chen <xuesong.chen@linux.alibaba.com>
>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>> Cc: James Morse <james.morse@arm.com>
>>>> Cc: Will Deacon <will@kernel.org>
>>>> Cc: Rafael. J. Wysocki <rafael@kernel.org>
>>>> Cc: Tony Luck <tony.luck@intel.com>
>>>> Cc: Tomasz Nowicki <tn@semihalf.com>
>>>> ---
>>>>  arch/x86/pci/mmconfig-shared.c | 28 --------------------------
>>>>  drivers/acpi/apei/apei-base.c  | 45 ++++++++++++++++++++++++++++--------------
>>>>  2 files changed, 30 insertions(+), 43 deletions(-)
>>>>
>>>> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
>>>> index 0b961fe6..12f7d96 100644
>>>> --- a/arch/x86/pci/mmconfig-shared.c
>>>> +++ b/arch/x86/pci/mmconfig-shared.c
>>>> @@ -605,32 +605,6 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
>>>>  	return 0;
>>>>  }
>>>>  
>>>> -#ifdef CONFIG_ACPI_APEI
>>>> -extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
>>>> -				     void *data), void *data);
>>>> -
>>>> -static int pci_mmcfg_for_each_region(int (*func)(__u64 start, __u64 size,
>>>> -				     void *data), void *data)
>>>> -{
>>>> -	struct pci_mmcfg_region *cfg;
>>>> -	int rc;
>>>> -
>>>> -	if (list_empty(&pci_mmcfg_list))
>>>> -		return 0;
>>>> -
>>>> -	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
>>>> -		rc = func(cfg->res.start, resource_size(&cfg->res), data);
>>>> -		if (rc)
>>>> -			return rc;
>>>> -	}
>>>> -
>>>> -	return 0;
>>>> -}
>>>> -#define set_apei_filter() (arch_apei_filter_addr = pci_mmcfg_for_each_region)
>>>> -#else
>>>> -#define set_apei_filter()
>>>> -#endif
>>>> -
>>>>  static void __init __pci_mmcfg_init(int early)
>>>>  {
>>>>  	pci_mmcfg_reject_broken(early);
>>>> @@ -665,8 +639,6 @@ void __init pci_mmcfg_early_init(void)
>>>>  		else
>>>>  			acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
>>>>  		__pci_mmcfg_init(1);
>>>> -
>>>> -		set_apei_filter();
>>>>  	}
>>>>  }
>>>>  
>>>> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
>>>> index c7fdb12..daae75a 100644
>>>> --- a/drivers/acpi/apei/apei-base.c
>>>> +++ b/drivers/acpi/apei/apei-base.c
>>>> @@ -21,6 +21,7 @@
>>>>  #include <linux/kernel.h>
>>>>  #include <linux/module.h>
>>>>  #include <linux/init.h>
>>>> +#include <linux/pci.h>
>>>>  #include <linux/acpi.h>
>>>>  #include <linux/slab.h>
>>>>  #include <linux/io.h>
>>>> @@ -448,13 +449,34 @@ static int apei_get_nvs_resources(struct apei_resources *resources)
>>>>  	return acpi_nvs_for_each_region(apei_get_res_callback, resources);
>>>>  }
>>>>  
>>>> -int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
>>>> -				     void *data), void *data);
>>>> -static int apei_get_arch_resources(struct apei_resources *resources)
>>>> +#ifdef CONFIG_PCI
>>>> +extern struct list_head pci_mmcfg_list;
>>>> +static int apei_filter_mcfg_addr(struct apei_resources *res,
>>>> +			struct apei_resources *mcfg_res)
>>>> +{
>>>> +	int rc = 0;
>>>> +	struct pci_mmcfg_region *cfg;
>>>> +
>>>> +	if (list_empty(&pci_mmcfg_list))
>>>> +		return 0;
>>>> +
>>>> +	apei_resources_init(mcfg_res);
>>>> +	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
>>>> +		rc = apei_res_add(&mcfg_res->iomem, cfg->res.start, resource_size(&cfg->res));
>>>> +		if (rc)
>>>> +			return rc;
>>>> +	}
>>>>  
>>>> +	/* filter the mcfg resource from current APEI's */
>>>> +	return apei_resources_sub(res, mcfg_res);
>>>> +}
>>>> +#else
>>>> +static inline int apei_filter_mcfg_addr(struct apei_resources *res,
>>>> +			struct apei_resources *mcfg_res)
>>>>  {
>>>> -	return arch_apei_filter_addr(apei_get_res_callback, resources);
>>>> +	return 0;
>>>>  }
>>>> +#endif
>>>>  
>>>>  /*
>>>>   * IO memory/port resource management mechanism is used to check
>>>> @@ -486,15 +508,9 @@ int apei_resources_request(struct apei_resources *resources,
>>>>  	if (rc)
>>>>  		goto nvs_res_fini;
>>>>  
>>>> -	if (arch_apei_filter_addr) {
>>>> -		apei_resources_init(&arch_res);
>>>> -		rc = apei_get_arch_resources(&arch_res);
>>>> -		if (rc)
>>>> -			goto arch_res_fini;
>>>> -		rc = apei_resources_sub(resources, &arch_res);
>>>> -		if (rc)
>>>> -			goto arch_res_fini;
>>>> -	}
>>>> +	rc = apei_filter_mcfg_addr(resources, &arch_res);
>>>> +	if (rc)
>>>> +		goto arch_res_fini;
>>>>  
>>>>  	rc = -EINVAL;
>>>>  	list_for_each_entry(res, &resources->iomem, list) {
>>>> @@ -544,8 +560,7 @@ int apei_resources_request(struct apei_resources *resources,
>>>>  		release_mem_region(res->start, res->end - res->start);
>>>>  	}
>>>>  arch_res_fini:
>>>> -	if (arch_apei_filter_addr)
>>>> -		apei_resources_fini(&arch_res);
>>>> +	apei_resources_fini(&arch_res);
>>>>  nvs_res_fini:
>>>>  	apei_resources_fini(&nvs_resources);
>>>>  	return rc;
>>>> -- 
>>>> 1.8.3.1
>>>>
Bjorn Helgaas Oct. 21, 2021, 4:57 p.m. UTC | #7
On Thu, Oct 21, 2021 at 11:46:40PM +0800, Xuesong Chen wrote:
> On 21/10/2021 02:50, Bjorn Helgaas wrote:
> > On Wed, Oct 20, 2021 at 11:16:38AM +0800, Xuesong Chen wrote:
> >> On 20/10/2021 03:23, Bjorn Helgaas wrote:
> >>> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:

> >>>> This patch will try to handle this case in a more common way
> >>>> instead of the original 'arch' specific solution, which will be
> >>>> beneficial to all the APEI-dependent platforms after that.
> >>>
> >>> This actually doesn't say anything about what the patch does or
> >>> how it works.  It says "handles this case in a more common way"
> >>> but with no details.
> >>
> >> Good suggestion, I'll give more details about that...
> >>
> >>> The EINJ table contains "injection instructions" that can read
> >>> or write "register regions" described by generic address
> >>> structures (see ACPI v6.3, sec 18.6.2 and 18.6.3), and
> >>> __einj_error_trigger() requests those register regions with
> >>> request_mem_region() or request_region() before executing the
> >>> injections instructions.
> >>>
> >>> IIUC, this patch basically says "if this region is part of the
> >>> MCFG area, we don't need to reserve it." That leads to the
> >>> questions of why we need to reserve *any* of the areas
> >>
> >> AFAIK, the MCFG area is reserved since the ECAM module will
> >> provide a generic Kernel Programming Interfaces(KPI), e.g,
> >> pci_generic_config_read(...), so all the drivers are allowed to
> >> access the pci config space only by those KPIs in a consistent
> >> and safe way, direct raw access will break the rule.  Correct me
> >> if I am missing sth.
> >>
> >>> and why it's safe to simply skip reserving regions that are part
> >>> of the MCFG area.
> >>
> >> Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error
> >> injection tolerance level") before to address this issue, the
> >> entire commit log as below:
> >>
> >>     Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
> >>     specific errors. EINJ will report errors as below when hitting such
> >>     cases:
> >>     
> >>     APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers
> >>     
> >>     It is because on x86 platform ACPI based PCI MMCFG logic has
> >>     reserved all MMCFG spaces so that EINJ can't reserve it again.
> >>     We already trust the ACPI/APEI code when using the EINJ interface
> >>     so it is not a big leap to also trust it to access the right
> >>     MMCFG addresses. Skip address checking to allow the access.
> > 
> > I'm not really convinced by that justification because I don't
> > think the issue here is *trust*.  If all we care about is trust,
> > and we trust the ACPI/APEI code, why do we need to reserve
> > anything at all when executing EINJ actions?
> > 
> > I think the resource reservation issue is about coordinating
> > multiple users of the address space.  A driver reserves the MMIO
> > address space of a device it controls so no other driver can
> > reserve it at the same time and cause conflicts.
> > 
> > I'm not really convinced by this mutual exclusion argument either,
> > because I haven't yet seen a situation where we say "EINJ needs a
> > resource that's already in use by somebody else, so we can't use
> > EINJ."  When conflicts arise, the response is always "we'll just
> > stop reserving this conflicting resource but use it anyway."
> > 
> > I think the only real value in apei_resources_request() is a
> > little bit of documentation in /proc/iomem.  For ERST and EINJ,
> > even that only lasts for the tiny period when we're actually
> > executing an action.
> > 
> > So convince me there's a reason why we shouldn't just remove
> > apei_resources_request() completely :)
> 
> I have to confess that currently I have no strong evidence/reason to
> convince you that it's absolute safe to remove
> apei_resources_request(),  probably in some conditions it *does*
> require to follow the mutual exclusion usage model.  The ECAM/MCFG
> maybe a special case not like other normal device driver, since all
> its MCFG space has been reserved during the initialization. Anyway,
> it's another topic and good point well worth discussing in the
> future.

This is missing the point.  It's not the MCFG reservation during
initialization that would make this safe.  What would make it safe is
the fact that ECAM does not require mutual exclusion.

When the hardware implements ECAM correctly, PCI config accesses do
not require locking because a config access requires a single MMIO
load or store.

Many non-ECAM config accessors *do* require locking because they use
several register accesses, e.g., the 0xCF8/0xCFC address/data pairs
used by pci_conf1_read().  If EINJ actions used these, we would have
to enforce mutual exclusion between EINJ config accesses and those
done by other drivers.

Some ARM64 platforms do not implement ECAM correctly, e.g.,
tegra194_map_bus() programs an outbound ATU and xgene_pcie_map_bus()
sets an RTDID register before the MMIO load/store.  Platforms like
this *do* require mutual exclusion between an EINJ config access and
other config accesses.

These platforms are supported via quirks in pci_mcfg.c, so they will
have resources in the pci_mcfg_list, and if we just ignore all the
MCFG resources in apei_resources_request(), there will be nothing to
prevent ordinary driver config accesses from being corrupted by EINJ
accesses.

I think in general, is probably *is* safe to remove MCFG resources
from the APEI reservations, but it would be better if we had some way
to prevent EINJ from using MCFG on platforms like tegra194 and xgene.

> From the patch set itself, I don't think it's a nice idea to make a
> dramatic change regarding the apei_resources_request() part, I
> suggest to keep the original rationale untouched and based on that
> to fix the real issue at hand in a more generic way.

There *was* no original rationale.  The whole point of this
conversation is to figure out what the real rationale is.

> >> Except that the above explanation, IMO the EINJ is only a RAS
> >> debug framework, in this code path, sometimes we need to acesss
> >> the address within the MCFG space directly to trigger kind of HW
> >> error, which behavior does not like the normal device driver's,
> >> in this case some possible unsafe operations (bypass the ecam
> >> ops) can be mitigated because the touched device will generate
> >> some HW errors and the RAS handling part will preempt its
> >> corresponding drivers to fix/log the HW error, that's my
> >> understanding about that.
> > 
> >>>> Signed-off-by: Xuesong Chen <xuesong.chen@linux.alibaba.com>
> >>>> Reported-by: kernel test robot <lkp@intel.com>
> >>>> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> >>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
> >>>> Cc: James Morse <james.morse@arm.com>
> >>>> Cc: Will Deacon <will@kernel.org>
> >>>> Cc: Rafael. J. Wysocki <rafael@kernel.org>
> >>>> Cc: Tony Luck <tony.luck@intel.com>
> >>>> Cc: Tomasz Nowicki <tn@semihalf.com>
> >>>> ---
> >>>>  arch/x86/pci/mmconfig-shared.c | 28 --------------------------
> >>>>  drivers/acpi/apei/apei-base.c  | 45 ++++++++++++++++++++++++++++--------------
> >>>>  2 files changed, 30 insertions(+), 43 deletions(-)
> >>>>
> >>>> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
> >>>> index 0b961fe6..12f7d96 100644
> >>>> --- a/arch/x86/pci/mmconfig-shared.c
> >>>> +++ b/arch/x86/pci/mmconfig-shared.c
> >>>> @@ -605,32 +605,6 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
> >>>>  	return 0;
> >>>>  }
> >>>>  
> >>>> -#ifdef CONFIG_ACPI_APEI
> >>>> -extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
> >>>> -				     void *data), void *data);
> >>>> -
> >>>> -static int pci_mmcfg_for_each_region(int (*func)(__u64 start, __u64 size,
> >>>> -				     void *data), void *data)
> >>>> -{
> >>>> -	struct pci_mmcfg_region *cfg;
> >>>> -	int rc;
> >>>> -
> >>>> -	if (list_empty(&pci_mmcfg_list))
> >>>> -		return 0;
> >>>> -
> >>>> -	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
> >>>> -		rc = func(cfg->res.start, resource_size(&cfg->res), data);
> >>>> -		if (rc)
> >>>> -			return rc;
> >>>> -	}
> >>>> -
> >>>> -	return 0;
> >>>> -}
> >>>> -#define set_apei_filter() (arch_apei_filter_addr = pci_mmcfg_for_each_region)
> >>>> -#else
> >>>> -#define set_apei_filter()
> >>>> -#endif
> >>>> -
> >>>>  static void __init __pci_mmcfg_init(int early)
> >>>>  {
> >>>>  	pci_mmcfg_reject_broken(early);
> >>>> @@ -665,8 +639,6 @@ void __init pci_mmcfg_early_init(void)
> >>>>  		else
> >>>>  			acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
> >>>>  		__pci_mmcfg_init(1);
> >>>> -
> >>>> -		set_apei_filter();
> >>>>  	}
> >>>>  }
> >>>>  
> >>>> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
> >>>> index c7fdb12..daae75a 100644
> >>>> --- a/drivers/acpi/apei/apei-base.c
> >>>> +++ b/drivers/acpi/apei/apei-base.c
> >>>> @@ -21,6 +21,7 @@
> >>>>  #include <linux/kernel.h>
> >>>>  #include <linux/module.h>
> >>>>  #include <linux/init.h>
> >>>> +#include <linux/pci.h>
> >>>>  #include <linux/acpi.h>
> >>>>  #include <linux/slab.h>
> >>>>  #include <linux/io.h>
> >>>> @@ -448,13 +449,34 @@ static int apei_get_nvs_resources(struct apei_resources *resources)
> >>>>  	return acpi_nvs_for_each_region(apei_get_res_callback, resources);
> >>>>  }
> >>>>  
> >>>> -int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
> >>>> -				     void *data), void *data);
> >>>> -static int apei_get_arch_resources(struct apei_resources *resources)
> >>>> +#ifdef CONFIG_PCI
> >>>> +extern struct list_head pci_mmcfg_list;
> >>>> +static int apei_filter_mcfg_addr(struct apei_resources *res,
> >>>> +			struct apei_resources *mcfg_res)
> >>>> +{
> >>>> +	int rc = 0;
> >>>> +	struct pci_mmcfg_region *cfg;
> >>>> +
> >>>> +	if (list_empty(&pci_mmcfg_list))
> >>>> +		return 0;
> >>>> +
> >>>> +	apei_resources_init(mcfg_res);
> >>>> +	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
> >>>> +		rc = apei_res_add(&mcfg_res->iomem, cfg->res.start, resource_size(&cfg->res));
> >>>> +		if (rc)
> >>>> +			return rc;
> >>>> +	}
> >>>>  
> >>>> +	/* filter the mcfg resource from current APEI's */
> >>>> +	return apei_resources_sub(res, mcfg_res);
> >>>> +}
> >>>> +#else
> >>>> +static inline int apei_filter_mcfg_addr(struct apei_resources *res,
> >>>> +			struct apei_resources *mcfg_res)
> >>>>  {
> >>>> -	return arch_apei_filter_addr(apei_get_res_callback, resources);
> >>>> +	return 0;
> >>>>  }
> >>>> +#endif
> >>>>  
> >>>>  /*
> >>>>   * IO memory/port resource management mechanism is used to check
> >>>> @@ -486,15 +508,9 @@ int apei_resources_request(struct apei_resources *resources,
> >>>>  	if (rc)
> >>>>  		goto nvs_res_fini;
> >>>>  
> >>>> -	if (arch_apei_filter_addr) {
> >>>> -		apei_resources_init(&arch_res);
> >>>> -		rc = apei_get_arch_resources(&arch_res);
> >>>> -		if (rc)
> >>>> -			goto arch_res_fini;
> >>>> -		rc = apei_resources_sub(resources, &arch_res);
> >>>> -		if (rc)
> >>>> -			goto arch_res_fini;
> >>>> -	}
> >>>> +	rc = apei_filter_mcfg_addr(resources, &arch_res);
> >>>> +	if (rc)
> >>>> +		goto arch_res_fini;
> >>>>  
> >>>>  	rc = -EINVAL;
> >>>>  	list_for_each_entry(res, &resources->iomem, list) {
> >>>> @@ -544,8 +560,7 @@ int apei_resources_request(struct apei_resources *resources,
> >>>>  		release_mem_region(res->start, res->end - res->start);
> >>>>  	}
> >>>>  arch_res_fini:
> >>>> -	if (arch_apei_filter_addr)
> >>>> -		apei_resources_fini(&arch_res);
> >>>> +	apei_resources_fini(&arch_res);
> >>>>  nvs_res_fini:
> >>>>  	apei_resources_fini(&nvs_resources);
> >>>>  	return rc;
> >>>> -- 
> >>>> 1.8.3.1
> >>>>
Xuesong Chen Oct. 22, 2021, 9:52 a.m. UTC | #8
On 22/10/2021 00:57, Bjorn Helgaas wrote:
> On Thu, Oct 21, 2021 at 11:46:40PM +0800, Xuesong Chen wrote:
>> On 21/10/2021 02:50, Bjorn Helgaas wrote:
>>> On Wed, Oct 20, 2021 at 11:16:38AM +0800, Xuesong Chen wrote:
>>>> On 20/10/2021 03:23, Bjorn Helgaas wrote:
>>>>> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
> 
>>>>>> This patch will try to handle this case in a more common way
>>>>>> instead of the original 'arch' specific solution, which will be
>>>>>> beneficial to all the APEI-dependent platforms after that.
>>>>>
>>>>> This actually doesn't say anything about what the patch does or
>>>>> how it works.  It says "handles this case in a more common way"
>>>>> but with no details.
>>>>
>>>> Good suggestion, I'll give more details about that...
>>>>
>>>>> The EINJ table contains "injection instructions" that can read
>>>>> or write "register regions" described by generic address
>>>>> structures (see ACPI v6.3, sec 18.6.2 and 18.6.3), and
>>>>> __einj_error_trigger() requests those register regions with
>>>>> request_mem_region() or request_region() before executing the
>>>>> injections instructions.
>>>>>
>>>>> IIUC, this patch basically says "if this region is part of the
>>>>> MCFG area, we don't need to reserve it." That leads to the
>>>>> questions of why we need to reserve *any* of the areas
>>>>
>>>> AFAIK, the MCFG area is reserved since the ECAM module will
>>>> provide a generic Kernel Programming Interfaces(KPI), e.g,
>>>> pci_generic_config_read(...), so all the drivers are allowed to
>>>> access the pci config space only by those KPIs in a consistent
>>>> and safe way, direct raw access will break the rule.  Correct me
>>>> if I am missing sth.
>>>>
>>>>> and why it's safe to simply skip reserving regions that are part
>>>>> of the MCFG area.
>>>>
>>>> Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error
>>>> injection tolerance level") before to address this issue, the
>>>> entire commit log as below:
>>>>
>>>>     Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
>>>>     specific errors. EINJ will report errors as below when hitting such
>>>>     cases:
>>>>     
>>>>     APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers
>>>>     
>>>>     It is because on x86 platform ACPI based PCI MMCFG logic has
>>>>     reserved all MMCFG spaces so that EINJ can't reserve it again.
>>>>     We already trust the ACPI/APEI code when using the EINJ interface
>>>>     so it is not a big leap to also trust it to access the right
>>>>     MMCFG addresses. Skip address checking to allow the access.
>>>
>>> I'm not really convinced by that justification because I don't
>>> think the issue here is *trust*.  If all we care about is trust,
>>> and we trust the ACPI/APEI code, why do we need to reserve
>>> anything at all when executing EINJ actions?
>>>
>>> I think the resource reservation issue is about coordinating
>>> multiple users of the address space.  A driver reserves the MMIO
>>> address space of a device it controls so no other driver can
>>> reserve it at the same time and cause conflicts.
>>>
>>> I'm not really convinced by this mutual exclusion argument either,
>>> because I haven't yet seen a situation where we say "EINJ needs a
>>> resource that's already in use by somebody else, so we can't use
>>> EINJ."  When conflicts arise, the response is always "we'll just
>>> stop reserving this conflicting resource but use it anyway."
>>>
>>> I think the only real value in apei_resources_request() is a
>>> little bit of documentation in /proc/iomem.  For ERST and EINJ,
>>> even that only lasts for the tiny period when we're actually
>>> executing an action.
>>>
>>> So convince me there's a reason why we shouldn't just remove
>>> apei_resources_request() completely :)
>>
>> I have to confess that currently I have no strong evidence/reason to
>> convince you that it's absolute safe to remove
>> apei_resources_request(),  probably in some conditions it *does*
>> require to follow the mutual exclusion usage model.  The ECAM/MCFG
>> maybe a special case not like other normal device driver, since all
>> its MCFG space has been reserved during the initialization. Anyway,
>> it's another topic and good point well worth discussing in the
>> future.
> 
> This is missing the point.  It's not the MCFG reservation during
> initialization that would make this safe.  What would make it safe is
> the fact that ECAM does not require mutual exclusion.
> 
> When the hardware implements ECAM correctly, PCI config accesses do
> not require locking because a config access requires a single MMIO
> load or store.
> 
I don't quite understand here, we're talking about apei_resources_request()
which is a mechanism to void resource conflict,"request_mem_region() tells the
kernel that your driver is going to use this range of I/O addresses, which
will prevent other drivers to make any overlapping call to the same region
through request_mem_region()", but according to the context of 'a single MMIO
load or store', are you talking about something like the mutex lock primitive?

> Many non-ECAM config accessors *do* require locking because they use
> several register accesses, e.g., the 0xCF8/0xCFC address/data pairs
> used by pci_conf1_read().  If EINJ actions used these, we would have
> to enforce mutual exclusion between EINJ config accesses and those
> done by other drivers.

I take a look at the pci_conf1_read() function, there's only a pair of
raw_spin_lock_irqsave() and raw_spin_unlock_irqrestore(), if that's the
mutual exclusion you mentioned, seems it's not related to the
apei_resources_request() we're talking about... 

> 
> Some ARM64 platforms do not implement ECAM correctly, e.g.,
> tegra194_map_bus() programs an outbound ATU and xgene_pcie_map_bus()
> sets an RTDID register before the MMIO load/store.  Platforms like
> this *do* require mutual exclusion between an EINJ config access and
> other config accesses.

What's the mutual exclusion for those quirk functions(tegra194 and xgene)?
*mutual* is not applied for single side. I can see neither locking nor
request_mem_region() in those bus map functions. 
> 
> These platforms are supported via quirks in pci_mcfg.c, so they will
> have resources in the pci_mcfg_list, and if we just ignore all the
> MCFG resources in apei_resources_request(), there will be nothing to
> prevent ordinary driver config accesses from being corrupted by EINJ
> accesses.
> 
> I think in general, is probably *is* safe to remove MCFG resources
> from the APEI reservations, but it would be better if we had some way
> to prevent EINJ from using MCFG on platforms like tegra194 and xgene.

Just as I mentioned, since there's no mutual exclusion applied for the
tegra194 and xgene(correct me if I am wrong), putting their MCFG resources
into the APEI reservation(so the apei_resources_request() applied) does nothing 

Thanks,
Xuesong
> 
>> From the patch set itself, I don't think it's a nice idea to make a
>> dramatic change regarding the apei_resources_request() part, I
>> suggest to keep the original rationale untouched and based on that
>> to fix the real issue at hand in a more generic way.
> 
> There *was* no original rationale.  The whole point of this
> conversation is to figure out what the real rationale is.
> 
>>>> Except that the above explanation, IMO the EINJ is only a RAS
>>>> debug framework, in this code path, sometimes we need to acesss
>>>> the address within the MCFG space directly to trigger kind of HW
>>>> error, which behavior does not like the normal device driver's,
>>>> in this case some possible unsafe operations (bypass the ecam
>>>> ops) can be mitigated because the touched device will generate
>>>> some HW errors and the RAS handling part will preempt its
>>>> corresponding drivers to fix/log the HW error, that's my
>>>> understanding about that.
>>>
>>>>>> Signed-off-by: Xuesong Chen <xuesong.chen@linux.alibaba.com>
>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>>> Cc: James Morse <james.morse@arm.com>
>>>>>> Cc: Will Deacon <will@kernel.org>
>>>>>> Cc: Rafael. J. Wysocki <rafael@kernel.org>
>>>>>> Cc: Tony Luck <tony.luck@intel.com>
>>>>>> Cc: Tomasz Nowicki <tn@semihalf.com>
>>>>>> ---
>>>>>>  arch/x86/pci/mmconfig-shared.c | 28 --------------------------
>>>>>>  drivers/acpi/apei/apei-base.c  | 45 ++++++++++++++++++++++++++++--------------
>>>>>>  2 files changed, 30 insertions(+), 43 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
>>>>>> index 0b961fe6..12f7d96 100644
>>>>>> --- a/arch/x86/pci/mmconfig-shared.c
>>>>>> +++ b/arch/x86/pci/mmconfig-shared.c
>>>>>> @@ -605,32 +605,6 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
>>>>>>  	return 0;
>>>>>>  }
>>>>>>  
>>>>>> -#ifdef CONFIG_ACPI_APEI
>>>>>> -extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
>>>>>> -				     void *data), void *data);
>>>>>> -
>>>>>> -static int pci_mmcfg_for_each_region(int (*func)(__u64 start, __u64 size,
>>>>>> -				     void *data), void *data)
>>>>>> -{
>>>>>> -	struct pci_mmcfg_region *cfg;
>>>>>> -	int rc;
>>>>>> -
>>>>>> -	if (list_empty(&pci_mmcfg_list))
>>>>>> -		return 0;
>>>>>> -
>>>>>> -	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
>>>>>> -		rc = func(cfg->res.start, resource_size(&cfg->res), data);
>>>>>> -		if (rc)
>>>>>> -			return rc;
>>>>>> -	}
>>>>>> -
>>>>>> -	return 0;
>>>>>> -}
>>>>>> -#define set_apei_filter() (arch_apei_filter_addr = pci_mmcfg_for_each_region)
>>>>>> -#else
>>>>>> -#define set_apei_filter()
>>>>>> -#endif
>>>>>> -
>>>>>>  static void __init __pci_mmcfg_init(int early)
>>>>>>  {
>>>>>>  	pci_mmcfg_reject_broken(early);
>>>>>> @@ -665,8 +639,6 @@ void __init pci_mmcfg_early_init(void)
>>>>>>  		else
>>>>>>  			acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
>>>>>>  		__pci_mmcfg_init(1);
>>>>>> -
>>>>>> -		set_apei_filter();
>>>>>>  	}
>>>>>>  }
>>>>>>  
>>>>>> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
>>>>>> index c7fdb12..daae75a 100644
>>>>>> --- a/drivers/acpi/apei/apei-base.c
>>>>>> +++ b/drivers/acpi/apei/apei-base.c
>>>>>> @@ -21,6 +21,7 @@
>>>>>>  #include <linux/kernel.h>
>>>>>>  #include <linux/module.h>
>>>>>>  #include <linux/init.h>
>>>>>> +#include <linux/pci.h>
>>>>>>  #include <linux/acpi.h>
>>>>>>  #include <linux/slab.h>
>>>>>>  #include <linux/io.h>
>>>>>> @@ -448,13 +449,34 @@ static int apei_get_nvs_resources(struct apei_resources *resources)
>>>>>>  	return acpi_nvs_for_each_region(apei_get_res_callback, resources);
>>>>>>  }
>>>>>>  
>>>>>> -int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
>>>>>> -				     void *data), void *data);
>>>>>> -static int apei_get_arch_resources(struct apei_resources *resources)
>>>>>> +#ifdef CONFIG_PCI
>>>>>> +extern struct list_head pci_mmcfg_list;
>>>>>> +static int apei_filter_mcfg_addr(struct apei_resources *res,
>>>>>> +			struct apei_resources *mcfg_res)
>>>>>> +{
>>>>>> +	int rc = 0;
>>>>>> +	struct pci_mmcfg_region *cfg;
>>>>>> +
>>>>>> +	if (list_empty(&pci_mmcfg_list))
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	apei_resources_init(mcfg_res);
>>>>>> +	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
>>>>>> +		rc = apei_res_add(&mcfg_res->iomem, cfg->res.start, resource_size(&cfg->res));
>>>>>> +		if (rc)
>>>>>> +			return rc;
>>>>>> +	}
>>>>>>  
>>>>>> +	/* filter the mcfg resource from current APEI's */
>>>>>> +	return apei_resources_sub(res, mcfg_res);
>>>>>> +}
>>>>>> +#else
>>>>>> +static inline int apei_filter_mcfg_addr(struct apei_resources *res,
>>>>>> +			struct apei_resources *mcfg_res)
>>>>>>  {
>>>>>> -	return arch_apei_filter_addr(apei_get_res_callback, resources);
>>>>>> +	return 0;
>>>>>>  }
>>>>>> +#endif
>>>>>>  
>>>>>>  /*
>>>>>>   * IO memory/port resource management mechanism is used to check
>>>>>> @@ -486,15 +508,9 @@ int apei_resources_request(struct apei_resources *resources,
>>>>>>  	if (rc)
>>>>>>  		goto nvs_res_fini;
>>>>>>  
>>>>>> -	if (arch_apei_filter_addr) {
>>>>>> -		apei_resources_init(&arch_res);
>>>>>> -		rc = apei_get_arch_resources(&arch_res);
>>>>>> -		if (rc)
>>>>>> -			goto arch_res_fini;
>>>>>> -		rc = apei_resources_sub(resources, &arch_res);
>>>>>> -		if (rc)
>>>>>> -			goto arch_res_fini;
>>>>>> -	}
>>>>>> +	rc = apei_filter_mcfg_addr(resources, &arch_res);
>>>>>> +	if (rc)
>>>>>> +		goto arch_res_fini;
>>>>>>  
>>>>>>  	rc = -EINVAL;
>>>>>>  	list_for_each_entry(res, &resources->iomem, list) {
>>>>>> @@ -544,8 +560,7 @@ int apei_resources_request(struct apei_resources *resources,
>>>>>>  		release_mem_region(res->start, res->end - res->start);
>>>>>>  	}
>>>>>>  arch_res_fini:
>>>>>> -	if (arch_apei_filter_addr)
>>>>>> -		apei_resources_fini(&arch_res);
>>>>>> +	apei_resources_fini(&arch_res);
>>>>>>  nvs_res_fini:
>>>>>>  	apei_resources_fini(&nvs_resources);
>>>>>>  	return rc;
>>>>>> -- 
>>>>>> 1.8.3.1
>>>>>>
Bjorn Helgaas Oct. 25, 2021, 11:37 p.m. UTC | #9
On Fri, Oct 22, 2021 at 05:52:15PM +0800, Xuesong Chen wrote:
> On 22/10/2021 00:57, Bjorn Helgaas wrote:
> > On Thu, Oct 21, 2021 at 11:46:40PM +0800, Xuesong Chen wrote:
> >> On 21/10/2021 02:50, Bjorn Helgaas wrote:
> >>> On Wed, Oct 20, 2021 at 11:16:38AM +0800, Xuesong Chen wrote:
> >>>> On 20/10/2021 03:23, Bjorn Helgaas wrote:
> >>>>> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
> > 
> >>>>>> This patch will try to handle this case in a more common way
> >>>>>> instead of the original 'arch' specific solution, which will be
> >>>>>> beneficial to all the APEI-dependent platforms after that.
> >>>>>
> >>>>> This actually doesn't say anything about what the patch does or
> >>>>> how it works.  It says "handles this case in a more common way"
> >>>>> but with no details.
> >>>>
> >>>> Good suggestion, I'll give more details about that...
> >>>>
> >>>>> The EINJ table contains "injection instructions" that can read
> >>>>> or write "register regions" described by generic address
> >>>>> structures (see ACPI v6.3, sec 18.6.2 and 18.6.3), and
> >>>>> __einj_error_trigger() requests those register regions with
> >>>>> request_mem_region() or request_region() before executing the
> >>>>> injections instructions.
> >>>>>
> >>>>> IIUC, this patch basically says "if this region is part of the
> >>>>> MCFG area, we don't need to reserve it." That leads to the
> >>>>> questions of why we need to reserve *any* of the areas
> >>>>
> >>>> AFAIK, the MCFG area is reserved since the ECAM module will
> >>>> provide a generic Kernel Programming Interfaces(KPI), e.g,
> >>>> pci_generic_config_read(...), so all the drivers are allowed to
> >>>> access the pci config space only by those KPIs in a consistent
> >>>> and safe way, direct raw access will break the rule.  Correct me
> >>>> if I am missing sth.
> >>>>
> >>>>> and why it's safe to simply skip reserving regions that are part
> >>>>> of the MCFG area.
> >>>>
> >>>> Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error
> >>>> injection tolerance level") before to address this issue, the
> >>>> entire commit log as below:
> >>>>
> >>>>     Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
> >>>>     specific errors. EINJ will report errors as below when hitting such
> >>>>     cases:
> >>>>     
> >>>>     APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers
> >>>>     
> >>>>     It is because on x86 platform ACPI based PCI MMCFG logic has
> >>>>     reserved all MMCFG spaces so that EINJ can't reserve it again.
> >>>>     We already trust the ACPI/APEI code when using the EINJ interface
> >>>>     so it is not a big leap to also trust it to access the right
> >>>>     MMCFG addresses. Skip address checking to allow the access.
> >>>
> >>> I'm not really convinced by that justification because I don't
> >>> think the issue here is *trust*.  If all we care about is trust,
> >>> and we trust the ACPI/APEI code, why do we need to reserve
> >>> anything at all when executing EINJ actions?
> >>>
> >>> I think the resource reservation issue is about coordinating
> >>> multiple users of the address space.  A driver reserves the MMIO
> >>> address space of a device it controls so no other driver can
> >>> reserve it at the same time and cause conflicts.
> >>>
> >>> I'm not really convinced by this mutual exclusion argument either,
> >>> because I haven't yet seen a situation where we say "EINJ needs a
> >>> resource that's already in use by somebody else, so we can't use
> >>> EINJ."  When conflicts arise, the response is always "we'll just
> >>> stop reserving this conflicting resource but use it anyway."
> >>>
> >>> I think the only real value in apei_resources_request() is a
> >>> little bit of documentation in /proc/iomem.  For ERST and EINJ,
> >>> even that only lasts for the tiny period when we're actually
> >>> executing an action.
> >>>
> >>> So convince me there's a reason why we shouldn't just remove
> >>> apei_resources_request() completely :)
> >>
> >> I have to confess that currently I have no strong evidence/reason to
> >> convince you that it's absolute safe to remove
> >> apei_resources_request(),  probably in some conditions it *does*
> >> require to follow the mutual exclusion usage model.  The ECAM/MCFG
> >> maybe a special case not like other normal device driver, since all
> >> its MCFG space has been reserved during the initialization. Anyway,
> >> it's another topic and good point well worth discussing in the
> >> future.
> > 
> > This is missing the point.  It's not the MCFG reservation during
> > initialization that would make this safe.  What would make it safe is
> > the fact that ECAM does not require mutual exclusion.
> > 
> > When the hardware implements ECAM correctly, PCI config accesses do
> > not require locking because a config access requires a single MMIO
> > load or store.
>
> I don't quite understand here, we're talking about
> apei_resources_request() which is a mechanism to void resource
> conflict,"request_mem_region() tells the kernel that your driver is
> going to use this range of I/O addresses, which will prevent other
> drivers to make any overlapping call to the same region through
> request_mem_region()", but according to the context of 'a single
> MMIO load or store', are you talking about something like the mutex
> lock primitive?

My point was that when ECAM is implemented correctly, a CPU does a
single MMIO load to do a PCI config read and a single MMIO store to do
a PCI config write.  In that case there no need for any locking, so
there's no need for APEI to reserve those resources.

This is what d91525eb8ee6 ("ACPI, EINJ: Enhance error injection
tolerance level") does.  That code change makes sense, but the commit
log does not -- it has nothing to do with trusting the ACPI/APEI code;
it's just that no matter what the EINJ actions do with the MCFG
regions, they cannot interfere with other drivers.

> > Many non-ECAM config accessors *do* require locking because they use
> > several register accesses, e.g., the 0xCF8/0xCFC address/data pairs
> > used by pci_conf1_read().  If EINJ actions used these, we would have
> > to enforce mutual exclusion between EINJ config accesses and those
> > done by other drivers.
> 
> I take a look at the pci_conf1_read() function, there's only a pair of
> raw_spin_lock_irqsave() and raw_spin_unlock_irqrestore(), if that's the
> mutual exclusion you mentioned, seems it's not related to the
> apei_resources_request() we're talking about... 

This was an example of a case where EINJ mutual exclusion *would* be
required.  I do not expect EINJ actions to use the 0xCF8/0xCFC
registers because there is no mechanism to coordinate that with the OS
use of the same registers.

> > Some ARM64 platforms do not implement ECAM correctly, e.g.,
> > tegra194_map_bus() programs an outbound ATU and xgene_pcie_map_bus()
> > sets an RTDID register before the MMIO load/store.  Platforms like
> > this *do* require mutual exclusion between an EINJ config access and
> > other config accesses.
> 
> What's the mutual exclusion for those quirk functions (tegra194 and
> xgene)?  *mutual* is not applied for single side. I can see neither
> locking nor request_mem_region() in those bus map functions. 

These currently depend on the pci_lock.  See PCI_OP_READ() in
drivers/pci/access.c.

EINJ actions cannot acquire the pci_lock, so EINJ actions cannot
safely use ECAM space on those platforms.

> > These platforms are supported via quirks in pci_mcfg.c, so they will
> > have resources in the pci_mcfg_list, and if we just ignore all the
> > MCFG resources in apei_resources_request(), there will be nothing to
> > prevent ordinary driver config accesses from being corrupted by EINJ
> > accesses.
> > 
> > I think in general, is probably *is* safe to remove MCFG resources
> > from the APEI reservations, but it would be better if we had some way
> > to prevent EINJ from using MCFG on platforms like tegra194 and xgene.
> 
> Just as I mentioned, since there's no mutual exclusion applied for
> the tegra194 and xgene (correct me if I am wrong), putting their MCFG
> resources into the APEI reservation (so the apei_resources_request()
> applied) does nothing 

I think apei_resources_request() should continue to reserve MCFG areas
on tegra194 and xgene, but it does not need to reserve them on other
ARM64 platforms.

Bjorn
Xuesong Chen Oct. 26, 2021, 9:16 a.m. UTC | #10
On 26/10/2021 07:37, Bjorn Helgaas wrote:
> On Fri, Oct 22, 2021 at 05:52:15PM +0800, Xuesong Chen wrote:
>> On 22/10/2021 00:57, Bjorn Helgaas wrote:
>>> On Thu, Oct 21, 2021 at 11:46:40PM +0800, Xuesong Chen wrote:
>>>> On 21/10/2021 02:50, Bjorn Helgaas wrote:
>>>>> On Wed, Oct 20, 2021 at 11:16:38AM +0800, Xuesong Chen wrote:
>>>>>> On 20/10/2021 03:23, Bjorn Helgaas wrote:
>>>>>>> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
>>>
>>>>>>>> This patch will try to handle this case in a more common way
>>>>>>>> instead of the original 'arch' specific solution, which will be
>>>>>>>> beneficial to all the APEI-dependent platforms after that.
>>>>>>>
>>>>>>> This actually doesn't say anything about what the patch does or
>>>>>>> how it works.  It says "handles this case in a more common way"
>>>>>>> but with no details.
>>>>>>
>>>>>> Good suggestion, I'll give more details about that...
>>>>>>
>>>>>>> The EINJ table contains "injection instructions" that can read
>>>>>>> or write "register regions" described by generic address
>>>>>>> structures (see ACPI v6.3, sec 18.6.2 and 18.6.3), and
>>>>>>> __einj_error_trigger() requests those register regions with
>>>>>>> request_mem_region() or request_region() before executing the
>>>>>>> injections instructions.
>>>>>>>
>>>>>>> IIUC, this patch basically says "if this region is part of the
>>>>>>> MCFG area, we don't need to reserve it." That leads to the
>>>>>>> questions of why we need to reserve *any* of the areas
>>>>>>
>>>>>> AFAIK, the MCFG area is reserved since the ECAM module will
>>>>>> provide a generic Kernel Programming Interfaces(KPI), e.g,
>>>>>> pci_generic_config_read(...), so all the drivers are allowed to
>>>>>> access the pci config space only by those KPIs in a consistent
>>>>>> and safe way, direct raw access will break the rule.  Correct me
>>>>>> if I am missing sth.
>>>>>>
>>>>>>> and why it's safe to simply skip reserving regions that are part
>>>>>>> of the MCFG area.
>>>>>>
>>>>>> Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error
>>>>>> injection tolerance level") before to address this issue, the
>>>>>> entire commit log as below:
>>>>>>
>>>>>>     Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
>>>>>>     specific errors. EINJ will report errors as below when hitting such
>>>>>>     cases:
>>>>>>     
>>>>>>     APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers
>>>>>>     
>>>>>>     It is because on x86 platform ACPI based PCI MMCFG logic has
>>>>>>     reserved all MMCFG spaces so that EINJ can't reserve it again.
>>>>>>     We already trust the ACPI/APEI code when using the EINJ interface
>>>>>>     so it is not a big leap to also trust it to access the right
>>>>>>     MMCFG addresses. Skip address checking to allow the access.
>>>>>
>>>>> I'm not really convinced by that justification because I don't
>>>>> think the issue here is *trust*.  If all we care about is trust,
>>>>> and we trust the ACPI/APEI code, why do we need to reserve
>>>>> anything at all when executing EINJ actions?
>>>>>
>>>>> I think the resource reservation issue is about coordinating
>>>>> multiple users of the address space.  A driver reserves the MMIO
>>>>> address space of a device it controls so no other driver can
>>>>> reserve it at the same time and cause conflicts.
>>>>>
>>>>> I'm not really convinced by this mutual exclusion argument either,
>>>>> because I haven't yet seen a situation where we say "EINJ needs a
>>>>> resource that's already in use by somebody else, so we can't use
>>>>> EINJ."  When conflicts arise, the response is always "we'll just
>>>>> stop reserving this conflicting resource but use it anyway."
>>>>>
>>>>> I think the only real value in apei_resources_request() is a
>>>>> little bit of documentation in /proc/iomem.  For ERST and EINJ,
>>>>> even that only lasts for the tiny period when we're actually
>>>>> executing an action.
>>>>>
>>>>> So convince me there's a reason why we shouldn't just remove
>>>>> apei_resources_request() completely :)
>>>>
>>>> I have to confess that currently I have no strong evidence/reason to
>>>> convince you that it's absolute safe to remove
>>>> apei_resources_request(),  probably in some conditions it *does*
>>>> require to follow the mutual exclusion usage model.  The ECAM/MCFG
>>>> maybe a special case not like other normal device driver, since all
>>>> its MCFG space has been reserved during the initialization. Anyway,
>>>> it's another topic and good point well worth discussing in the
>>>> future.
>>>
>>> This is missing the point.  It's not the MCFG reservation during
>>> initialization that would make this safe.  What would make it safe is
>>> the fact that ECAM does not require mutual exclusion.
>>>
>>> When the hardware implements ECAM correctly, PCI config accesses do
>>> not require locking because a config access requires a single MMIO
>>> load or store.
>>
>> I don't quite understand here, we're talking about
>> apei_resources_request() which is a mechanism to void resource
>> conflict,"request_mem_region() tells the kernel that your driver is
>> going to use this range of I/O addresses, which will prevent other
>> drivers to make any overlapping call to the same region through
>> request_mem_region()", but according to the context of 'a single
>> MMIO load or store', are you talking about something like the mutex
>> lock primitive?
> 
> My point was that when ECAM is implemented correctly, a CPU does a
> single MMIO load to do a PCI config read and a single MMIO store to do
> a PCI config write.  In that case there no need for any locking, so
> there's no need for APEI to reserve those resources.

Ah, got it. That means the PCI ECAM has a implicit mutual exclusion with EINJ
if the hardware implemention is correct, so we can remove the MCFG from
the APEI's safely.

> 
> This is what d91525eb8ee6 ("ACPI, EINJ: Enhance error injection
> tolerance level") does.  That code change makes sense, but the commit
> log does not -- it has nothing to do with trusting the ACPI/APEI code;
> it's just that no matter what the EINJ actions do with the MCFG
> regions, they cannot interfere with other drivers.
> 
>>> Many non-ECAM config accessors *do* require locking because they use
>>> several register accesses, e.g., the 0xCF8/0xCFC address/data pairs
>>> used by pci_conf1_read().  If EINJ actions used these, we would have
>>> to enforce mutual exclusion between EINJ config accesses and those
>>> done by other drivers.
>>
>> I take a look at the pci_conf1_read() function, there's only a pair of
>> raw_spin_lock_irqsave() and raw_spin_unlock_irqrestore(), if that's the
>> mutual exclusion you mentioned, seems it's not related to the
>> apei_resources_request() we're talking about... 
> 
> This was an example of a case where EINJ mutual exclusion *would* be
> required.  I do not expect EINJ actions to use the 0xCF8/0xCFC
> registers because there is no mechanism to coordinate that with the OS
> use of the same registers.
> 
>>> Some ARM64 platforms do not implement ECAM correctly, e.g.,
>>> tegra194_map_bus() programs an outbound ATU and xgene_pcie_map_bus()
>>> sets an RTDID register before the MMIO load/store.  Platforms like
>>> this *do* require mutual exclusion between an EINJ config access and
>>> other config accesses.
>>
>> What's the mutual exclusion for those quirk functions (tegra194 and
>> xgene)?  *mutual* is not applied for single side. I can see neither
>> locking nor request_mem_region() in those bus map functions. 
> 
> These currently depend on the pci_lock.  See PCI_OP_READ() in
> drivers/pci/access.c.
> 
> EINJ actions cannot acquire the pci_lock, so EINJ actions cannot
> safely use ECAM space on those platforms> 
>>> These platforms are supported via quirks in pci_mcfg.c, so they will
>>> have resources in the pci_mcfg_list, and if we just ignore all the
>>> MCFG resources in apei_resources_request(), there will be nothing to
>>> prevent ordinary driver config accesses from being corrupted by EINJ
>>> accesses.
>>>
>>> I think in general, is probably *is* safe to remove MCFG resources
>>> from the APEI reservations, but it would be better if we had some way
>>> to prevent EINJ from using MCFG on platforms like tegra194 and xgene.
>>
>> Just as I mentioned, since there's no mutual exclusion applied for
>> the tegra194 and xgene (correct me if I am wrong), putting their MCFG
>> resources into the APEI reservation (so the apei_resources_request()
>> applied) does nothing 
> 
> I think apei_resources_request() should continue to reserve MCFG areas
> on tegra194 and xgene, but it does not need to reserve them on other
> ARM64 platforms.

As a summary: we need to reserve the MCFG areas on those platforms with a
quirk ECAM implementation since there's no lockless method to access the
configuration space, on other platforms we don't need to reserve the MCFG
resources (so can remove it safely).

So we need to add another patch to handle the case of tegra194 and xgene...
I will try to figure it out. 


Thanks,
Xuesong

> 
> Bjorn
>
Bjorn Helgaas Oct. 26, 2021, 8:47 p.m. UTC | #11
On Tue, Oct 26, 2021 at 05:16:47PM +0800, Xuesong Chen wrote:
> On 26/10/2021 07:37, Bjorn Helgaas wrote:

> > My point was that when ECAM is implemented correctly, a CPU does a
> > single MMIO load to do a PCI config read and a single MMIO store to do
> > a PCI config write.  In that case there no need for any locking, so
> > there's no need for APEI to reserve those resources.
> 
> Ah, got it. That means the PCI ECAM has a implicit mutual exclusion with EINJ
> if the hardware implemention is correct, so we can remove the MCFG from
> the APEI's safely.

Well, not quite.  ECAM doesn't *need* mutual exclusion.  Single loads
and stores are atomic by definition.

> > I think apei_resources_request() should continue to reserve MCFG areas
> > on tegra194 and xgene, but it does not need to reserve them on other
> > ARM64 platforms.
> 
> As a summary: we need to reserve the MCFG areas on those platforms with a
> quirk ECAM implementation since there's no lockless method to access the
> configuration space, on other platforms we don't need to reserve the MCFG
> resources (so can remove it safely).
> 
> So we need to add another patch to handle the case of tegra194 and xgene...
> I will try to figure it out. 

I looked through these again and found another problem case (thunder).
Here are my notes from my research.

Normal ECAM users require no device-specific support.  The platform
supplies an MCFG table, the generic code works, no mutual exclusion is
required, and APEI doesn't need to reserve the MCFG areas.

The problem cases are platforms that supply an MCFG table but require
some device-specific workarounds.  We can identify these because they
have quirks in pci-mcfg.c.  Here are the existing quirks and the
pci_ecam_ops structs they supply:

  AL_ECAM             al_pcie_ops                 # OK
  QCOM_ECAM32         pci_32b_ops                 # OK
  HISI_QUAD_DOM       hisi_pcie_ops               # OK
  THUNDER_PEM_QUIRK   thunder_pem_ecam_ops        # problem
  THUNDER_PEM_QUIRK   thunder_pem_ecam_ops        # problem
  THUNDER_ECAM_QUIRK  pci_thunder_ecam_ops        # OK
  tegra               tegra194_pcie_ops           # problem
  XGENE_V1_ECAM_MCFG  xgene_v1_pcie_ecam_ops      # problem
  XGENE_V2_ECAM_MCFG  xgene_v2_pcie_ecam_ops      # problem
  ALTRA_ECAM_QUIRK    pci_32b_read_ops            # OK

The ones marked "OK" have .map_bus(), .read(), and .write() methods
that need no mutual exclusion because they boil down to just a single
MMIO load or store.  These are fine and there shouldn't be a problem
if an EINJ action accesses the ECAM space.

The others do require mutual exclusion:

  - thunder_pem_ecam_ops: thunder_pem_config_read() calls
    thunder_pem_bridge_read(), which does a writeq() to PEM_CFG_RD
    followed by a readq().  The writeq() and readq() must be atomic to
    avoid corruption.

  - tegra194_pcie_ops: tegra194_map_bus() programs the ATU.  This and
    the subsequent ECAM read/write must be atomic.

  - xgene_v1_pcie_ecam_ops and xgene_v2_pcie_ecam_ops:
    xgene_pcie_map_bus() sets the RTID.  This and the subsequent ECAM
    read/write must be atomic.

I had to look at all these ops individually to find them, so I don't
see an easy way to identify these problem cases at run-time.

I personally would not have an issue with having APEI try to reserve
the MCFG regions for any platform that has an MCFG quirk.  That would
prevent the al, qcom, hisi, thunder-ecam, and altra drivers from using
EINJ even though it would probably be safe for them.  But we already
know those platforms are not really ACPI-compliant, so ...

Bjorn
Xuesong Chen Oct. 27, 2021, 5:29 a.m. UTC | #12
On 27/10/2021 04:47, Bjorn Helgaas wrote:
> On Tue, Oct 26, 2021 at 05:16:47PM +0800, Xuesong Chen wrote:
>> On 26/10/2021 07:37, Bjorn Helgaas wrote:
> 
>>> My point was that when ECAM is implemented correctly, a CPU does a
>>> single MMIO load to do a PCI config read and a single MMIO store to do
>>> a PCI config write.  In that case there no need for any locking, so
>>> there's no need for APEI to reserve those resources.
>>
>> Ah, got it. That means the PCI ECAM has a implicit mutual exclusion with EINJ
>> if the hardware implemention is correct, so we can remove the MCFG from
>> the APEI's safely.
> 
> Well, not quite.  ECAM doesn't *need* mutual exclusion.  Single loads
> and stores are atomic by definition.
> 

OK, because ECAM config access has intrinsic atomic primitive, so no need to
reserve those ECAM config space resources for APEI, that's why commit d91525eb8ee6 
("ACPI, EINJ: Enhance error injection tolerance level") fix make sense... 

>>> I think apei_resources_request() should continue to reserve MCFG areas
>>> on tegra194 and xgene, but it does not need to reserve them on other
>>> ARM64 platforms.
>>
>> As a summary: we need to reserve the MCFG areas on those platforms with a
>> quirk ECAM implementation since there's no lockless method to access the
>> configuration space, on other platforms we don't need to reserve the MCFG
>> resources (so can remove it safely).
>>
>> So we need to add another patch to handle the case of tegra194 and xgene...
>> I will try to figure it out. 
> 
> I looked through these again and found another problem case (thunder).
> Here are my notes from my research.
> 
> Normal ECAM users require no device-specific support.  The platform
> supplies an MCFG table, the generic code works, no mutual exclusion is
> required, and APEI doesn't need to reserve the MCFG areas.
> 
> The problem cases are platforms that supply an MCFG table but require
> some device-specific workarounds.  We can identify these because they
> have quirks in pci-mcfg.c.  Here are the existing quirks and the
> pci_ecam_ops structs they supply:
> 
>   AL_ECAM             al_pcie_ops                 # OK
>   QCOM_ECAM32         pci_32b_ops                 # OK
>   HISI_QUAD_DOM       hisi_pcie_ops               # OK
>   THUNDER_PEM_QUIRK   thunder_pem_ecam_ops        # problem
>   THUNDER_PEM_QUIRK   thunder_pem_ecam_ops        # problem
>   THUNDER_ECAM_QUIRK  pci_thunder_ecam_ops        # OK
>   tegra               tegra194_pcie_ops           # problem
>   XGENE_V1_ECAM_MCFG  xgene_v1_pcie_ecam_ops      # problem
>   XGENE_V2_ECAM_MCFG  xgene_v2_pcie_ecam_ops      # problem
>   ALTRA_ECAM_QUIRK    pci_32b_read_ops            # OK
> 
> The ones marked "OK" have .map_bus(), .read(), and .write() methods
> that need no mutual exclusion because they boil down to just a single
> MMIO load or store.  These are fine and there shouldn't be a problem
> if an EINJ action accesses the ECAM space.
> 
> The others do require mutual exclusion:
> 
>   - thunder_pem_ecam_ops: thunder_pem_config_read() calls
>     thunder_pem_bridge_read(), which does a writeq() to PEM_CFG_RD
>     followed by a readq().  The writeq() and readq() must be atomic to
>     avoid corruption.
> 
>   - tegra194_pcie_ops: tegra194_map_bus() programs the ATU.  This and
>     the subsequent ECAM read/write must be atomic.
> 
>   - xgene_v1_pcie_ecam_ops and xgene_v2_pcie_ecam_ops:
>     xgene_pcie_map_bus() sets the RTID.  This and the subsequent ECAM
>     read/write must be atomic.
> 
> I had to look at all these ops individually to find them, so I don't
> see an easy way to identify these problem cases at run-time.
> 
> I personally would not have an issue with having APEI try to reserve
> the MCFG regions for any platform that has an MCFG quirk.  That would
> prevent the al, qcom, hisi, thunder-ecam, and altra drivers from using
> EINJ even though it would probably be safe for them.  But we already
> know those platforms are not really ACPI-compliant, so ...

OK, understood. Since those platforms are not really ACPI-compliant, so
we can unify all the quirks together. Let me send an inital solution about
this for your review and see if there's room for further improvement...

Thanks,
Xuesong

> 
> Bjorn
>
diff mbox series

Patch

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 0b961fe6..12f7d96 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -605,32 +605,6 @@  static int __init pci_parse_mcfg(struct acpi_table_header *header)
 	return 0;
 }
 
-#ifdef CONFIG_ACPI_APEI
-extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
-				     void *data), void *data);
-
-static int pci_mmcfg_for_each_region(int (*func)(__u64 start, __u64 size,
-				     void *data), void *data)
-{
-	struct pci_mmcfg_region *cfg;
-	int rc;
-
-	if (list_empty(&pci_mmcfg_list))
-		return 0;
-
-	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
-		rc = func(cfg->res.start, resource_size(&cfg->res), data);
-		if (rc)
-			return rc;
-	}
-
-	return 0;
-}
-#define set_apei_filter() (arch_apei_filter_addr = pci_mmcfg_for_each_region)
-#else
-#define set_apei_filter()
-#endif
-
 static void __init __pci_mmcfg_init(int early)
 {
 	pci_mmcfg_reject_broken(early);
@@ -665,8 +639,6 @@  void __init pci_mmcfg_early_init(void)
 		else
 			acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
 		__pci_mmcfg_init(1);
-
-		set_apei_filter();
 	}
 }
 
diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
index c7fdb12..daae75a 100644
--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -21,6 +21,7 @@ 
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/init.h>
+#include <linux/pci.h>
 #include <linux/acpi.h>
 #include <linux/slab.h>
 #include <linux/io.h>
@@ -448,13 +449,34 @@  static int apei_get_nvs_resources(struct apei_resources *resources)
 	return acpi_nvs_for_each_region(apei_get_res_callback, resources);
 }
 
-int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
-				     void *data), void *data);
-static int apei_get_arch_resources(struct apei_resources *resources)
+#ifdef CONFIG_PCI
+extern struct list_head pci_mmcfg_list;
+static int apei_filter_mcfg_addr(struct apei_resources *res,
+			struct apei_resources *mcfg_res)
+{
+	int rc = 0;
+	struct pci_mmcfg_region *cfg;
+
+	if (list_empty(&pci_mmcfg_list))
+		return 0;
+
+	apei_resources_init(mcfg_res);
+	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
+		rc = apei_res_add(&mcfg_res->iomem, cfg->res.start, resource_size(&cfg->res));
+		if (rc)
+			return rc;
+	}
 
+	/* filter the mcfg resource from current APEI's */
+	return apei_resources_sub(res, mcfg_res);
+}
+#else
+static inline int apei_filter_mcfg_addr(struct apei_resources *res,
+			struct apei_resources *mcfg_res)
 {
-	return arch_apei_filter_addr(apei_get_res_callback, resources);
+	return 0;
 }
+#endif
 
 /*
  * IO memory/port resource management mechanism is used to check
@@ -486,15 +508,9 @@  int apei_resources_request(struct apei_resources *resources,
 	if (rc)
 		goto nvs_res_fini;
 
-	if (arch_apei_filter_addr) {
-		apei_resources_init(&arch_res);
-		rc = apei_get_arch_resources(&arch_res);
-		if (rc)
-			goto arch_res_fini;
-		rc = apei_resources_sub(resources, &arch_res);
-		if (rc)
-			goto arch_res_fini;
-	}
+	rc = apei_filter_mcfg_addr(resources, &arch_res);
+	if (rc)
+		goto arch_res_fini;
 
 	rc = -EINVAL;
 	list_for_each_entry(res, &resources->iomem, list) {
@@ -544,8 +560,7 @@  int apei_resources_request(struct apei_resources *resources,
 		release_mem_region(res->start, res->end - res->start);
 	}
 arch_res_fini:
-	if (arch_apei_filter_addr)
-		apei_resources_fini(&arch_res);
+	apei_resources_fini(&arch_res);
 nvs_res_fini:
 	apei_resources_fini(&nvs_resources);
 	return rc;