diff mbox series

[5/8] PCI/AER: Introduce ratelimit for AER IRQs

Message ID 20250115074301.3514927-6-pandoh@google.com (mailing list archive)
State New
Delegated to: Bjorn Helgaas
Headers show
Series Rate limit AER logs/IRQs | expand

Commit Message

Jon Pan-Doh Jan. 15, 2025, 7:42 a.m. UTC
After ratelimiting logs, spammy devices can still slow execution by
continued AER IRQ servicing.

Add higher per-device ratelimits for AER errors to mask out those IRQs.
Set the default rate to 3x default AER ratelimit (30 per 5s).

Tested using aer-inject[1] tool. Injected 32 AER errors. Observed IRQ
masked via lspci and sysfs counters record 31 errors (1 masked).

Before: CEMsk: BadTLP-
After:  CEMsk: BadTLP+

[1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git

Signed-off-by: Jon Pan-Doh <pandoh@google.com>
---
 Documentation/PCI/pcieaer-howto.rst |  4 +-
 drivers/pci/pcie/aer.c              | 64 +++++++++++++++++++++++++----
 2 files changed, 57 insertions(+), 11 deletions(-)

Comments

Karolina Stolarek Jan. 16, 2025, 12:02 p.m. UTC | #1
On 15/01/2025 08:42, Jon Pan-Doh wrote:
> (...)
 > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
 > index 025c50b0f293..1db70ae87f52 100644
 > --- a/drivers/pci/pcie/aer.c
 > +++ b/drivers/pci/pcie/aer.c
> @@ -676,6 +682,39 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
>   	}
>   }
>   
> +static void mask_reported_error(struct pci_dev *dev, struct aer_err_info *info)
> +{
> +	const char **strings;
> +	const char *errmsg;
> +	u16 aer_offset = dev->aer_cap;
> +	u16 mask_reg_offset;
> +	u32 mask;
> +	unsigned long status = info->status;
> +	int i;
> +
> +	if (info->severity == AER_CORRECTABLE) {
> +		strings = aer_correctable_error_string;
> +		mask_reg_offset = PCI_ERR_COR_MASK;
> +	} else {
> +		strings = aer_uncorrectable_error_string;
> +		mask_reg_offset = PCI_ERR_UNCOR_MASK;
> +	}
> +
> +	pci_read_config_dword(dev, aer_offset + mask_reg_offset, &mask);
> +	mask |= status;
> +	pci_write_config_dword(dev, aer_offset + mask_reg_offset, mask);
> +
> +	pci_warn(dev, "%s error(s) masked due to rate-limiting:",
> +		 aer_error_severity_string[info->severity]);
> +	for_each_set_bit(i, &status, 32) {
> +		errmsg = strings[i];
> +		if (!errmsg)
> +			errmsg = "Unknown Error Bit";
> +
> +		pci_warn(dev, "   [%2d] %-22s\n", i, errmsg);
> +	}
> +}
> +

My approach was more heavy-handed :)

To confirm that I understand the flow -- when we're processing
aer_err_info, that potentially carries a couple of errors, and we hit a
ratelimit, we mask the error bits in Error Status Register and print
a warning. After doing so, we won't see these types of errors reported
again. What if some time passes (let's say, 2 mins), and we hit a
condition that would normally generate an error but it's now masked? Are
we fine with missing it? I think we should be informed about
Uncorrectable errors as much as possible, as they indicate Link
integrity issues.

> -	if (!__ratelimit(ratelimit))
> +	if (!__ratelimit(irq_ratelimit)) {
> +		mask_reported_error(dev, info);
> +		return;
> +	}
> +	if (!__ratelimit(log_ratelimit))
>   		return;

Nit: add a line between these two ifs

All the best,
Karolina
Jon Pan-Doh Jan. 18, 2025, 1:58 a.m. UTC | #2
On Thu, Jan 16, 2025 at 4:02 AM Karolina Stolarek
<karolina.stolarek@oracle.com> wrote:
> To confirm that I understand the flow -- when we're processing
> aer_err_info, that potentially carries a couple of errors, and we hit a
> ratelimit, we mask the error bits in Error Status Register and print
> a warning. After doing so, we won't see these types of errors reported
> again. What if some time passes (let's say, 2 mins), and we hit a
> condition that would normally generate an error but it's now masked? Are
> we fine with missing it? I think we should be informed about
> Uncorrectable errors as much as possible, as they indicate Link
> integrity issues.

Your understanding is correct. There's definitely more nuance/tradeoff
with uncorrectable errors (likelihood of uncorrectable spam vs.
missing critical errors). At the minimum, I think the uncorrectable
IRQ default should be higher (semi-arbitrarily chose defaults for
IRQs).

I think a dynamic (un)masking in the kernel is a bit too much and
punted the decision to userspace (e.g. rasdaemon et al.) to manage
(part of OCP Fault Management groups roadmap).

Other options include:
- only focus on correctable errors
    - seen uncorrectable spam e.g. new HW bringup but it is rarer
- some type of system-wide toggle (sysfs, kernel config/cmdline) for
uncorrectable spam handling (may be clunky)

Thanks,
Jon
Karolina Stolarek Jan. 20, 2025, 10:38 a.m. UTC | #3
On 18/01/2025 02:58, Jon Pan-Doh wrote:
> On Thu, Jan 16, 2025 at 4:02 AM Karolina Stolarek
> <karolina.stolarek@oracle.com> wrote:
>> To confirm that I understand the flow -- when we're processing
>> aer_err_info, that potentially carries a couple of errors, and we hit a
>> ratelimit, we mask the error bits in Error Status Register and print
>> a warning. After doing so, we won't see these types of errors reported
>> again. What if some time passes (let's say, 2 mins), and we hit a
>> condition that would normally generate an error but it's now masked? Are
>> we fine with missing it? I think we should be informed about
>> Uncorrectable errors as much as possible, as they indicate Link
>> integrity issues.
> 
> Your understanding is correct. There's definitely more nuance/tradeoff
> with uncorrectable errors (likelihood of uncorrectable spam vs.
> missing critical errors). At the minimum, I think the uncorrectable
> IRQ default should be higher (semi-arbitrarily chose defaults for
> IRQs).

My comment was mostly me worrying about Uncorrectable errors. Pulling 
the plug on Correctable errors after we see too many is reasonable, I think.

> I think a dynamic (un)masking in the kernel is a bit too much and
> punted the decision to userspace (e.g. rasdaemon et al.) to manage
> (part of OCP Fault Management groups roadmap).

I agree. Still, if we decide to go with IRQ masking for (Un)correctable 
errors, this should be communicated to the user in the documentation.

All the best,
Karolina

> 
> Other options include:
> - only focus on correctable errors
>      - seen uncorrectable spam e.g. new HW bringup but it is rarer
> - some type of system-wide toggle (sysfs, kernel config/cmdline) for
> uncorrectable spam handling (may be clunky)
> 
> Thanks,
> Jon
Lukas Wunner Jan. 25, 2025, 7:39 a.m. UTC | #4
On Tue, Jan 14, 2025 at 11:42:57PM -0800, Jon Pan-Doh wrote:
> After ratelimiting logs, spammy devices can still slow execution by
> continued AER IRQ servicing.
> 
> Add higher per-device ratelimits for AER errors to mask out those IRQs.
> Set the default rate to 3x default AER ratelimit (30 per 5s).

Masking errors at the register level feels overzealous,
in particular because it also disables logging via tracepoints.

Is there a concrete device that necessitates this change?
If there is, consider adding a quirk for this particular device
which masks specific errors, but doesn't affect other devices.
If there isn't, consider dropping this change until a buggy device
appears that actually needs it.

Thanks,

Lukas
Jonathan Cameron Jan. 31, 2025, 2:43 p.m. UTC | #5
On Sat, 25 Jan 2025 08:39:35 +0100
Lukas Wunner <lukas@wunner.de> wrote:

> On Tue, Jan 14, 2025 at 11:42:57PM -0800, Jon Pan-Doh wrote:
> > After ratelimiting logs, spammy devices can still slow execution by
> > continued AER IRQ servicing.
> > 
> > Add higher per-device ratelimits for AER errors to mask out those IRQs.
> > Set the default rate to 3x default AER ratelimit (30 per 5s).  
> 
> Masking errors at the register level feels overzealous,
> in particular because it also disables logging via tracepoints.
> 
> Is there a concrete device that necessitates this change?
> If there is, consider adding a quirk for this particular device
> which masks specific errors, but doesn't affect other devices.
> If there isn't, consider dropping this change until a buggy device
> appears that actually needs it.

Fully agree with this comment.  At very least this should default
to not ratelimiting on the tracepoints unless a specific opt in has
occurred (probably a platform or device driver quirk).

In particular I'd worry that you are masking whatever errors are
finally trigger masking.  That might be the only one of that
particular type that was seen and I think the only report we
see is the 'I masked it message'.  So rasdaemon for example
never sees the error at all.   So another tweak would be report
one last time so we definitely see any given error type at least
once.

For CXL errors we trigger off one AER error type (internal error),
but then that is multiplexed onto finer grained errors. Even if
we fix the above we would want the masking in the CXL RAS controls,
not AER.

Jonathan


> 
> Thanks,
> 
> Lukas
>
Jonathan Cameron Jan. 31, 2025, 2:55 p.m. UTC | #6
On Tue, 14 Jan 2025 23:42:57 -0800
Jon Pan-Doh <pandoh@google.com> wrote:

> After ratelimiting logs, spammy devices can still slow execution by
> continued AER IRQ servicing.
> 
> Add higher per-device ratelimits for AER errors to mask out those IRQs.
> Set the default rate to 3x default AER ratelimit (30 per 5s).
> 
> Tested using aer-inject[1] tool. Injected 32 AER errors. Observed IRQ
> masked via lspci and sysfs counters record 31 errors (1 masked).
> 
> Before: CEMsk: BadTLP-
> After:  CEMsk: BadTLP+
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git
> 
> Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Hi Jon,

Comment inline.

> ---
>  Documentation/PCI/pcieaer-howto.rst |  4 +-
>  drivers/pci/pcie/aer.c              | 64 +++++++++++++++++++++++++----
>  2 files changed, 57 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst
> index 5546de60f184..d41272504b18 100644
> --- a/Documentation/PCI/pcieaer-howto.rst
> +++ b/Documentation/PCI/pcieaer-howto.rst
> @@ -88,8 +88,8 @@ fields.
>  AER Ratelimits
>  -------------------------
>  
> -Error messages are ratelimited per device and error type. This prevents spammy
> -devices from flooding the console.
> +Errors, both at log and IRQ level, are ratelimited per device and error type.
> +This prevents spammy devices from stalling execution.
>  
>  AER Statistics / Counters
>  -------------------------
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 025c50b0f293..1db70ae87f52 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -87,6 +87,8 @@ struct aer_info {
>  	u64 rootport_total_nonfatal_errs;
>  
>  	/* Ratelimits for errors */
> +	struct ratelimit_state cor_irq_ratelimit;
> +	struct ratelimit_state uncor_irq_ratelimit;
>  	struct ratelimit_state cor_log_ratelimit;
>  	struct ratelimit_state uncor_log_ratelimit;
>  };
> @@ -379,6 +381,10 @@ void pci_aer_init(struct pci_dev *dev)
>  		return;
>  
>  	dev->aer_info = kzalloc(sizeof(struct aer_info), GFP_KERNEL);
> +	ratelimit_state_init(&dev->aer_info->cor_irq_ratelimit,
> +			     DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST*3);
> +	ratelimit_state_init(&dev->aer_info->uncor_irq_ratelimit,
> +			     DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST*3);
>  	ratelimit_state_init(&dev->aer_info->cor_log_ratelimit,
>  			     DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
>  	ratelimit_state_init(&dev->aer_info->uncor_log_ratelimit,
> @@ -676,6 +682,39 @@ static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
>  	}
>  }
>  
> +static void mask_reported_error(struct pci_dev *dev, struct aer_err_info *info)
> +{
> +	const char **strings;
> +	const char *errmsg;
> +	u16 aer_offset = dev->aer_cap;
> +	u16 mask_reg_offset;
> +	u32 mask;
> +	unsigned long status = info->status;
> +	int i;
> +
> +	if (info->severity == AER_CORRECTABLE) {
> +		strings = aer_correctable_error_string;
> +		mask_reg_offset = PCI_ERR_COR_MASK;
> +	} else {
> +		strings = aer_uncorrectable_error_string;
> +		mask_reg_offset = PCI_ERR_UNCOR_MASK;
> +	}
> +
> +	pci_read_config_dword(dev, aer_offset + mask_reg_offset, &mask);
> +	mask |= status;
> +	pci_write_config_dword(dev, aer_offset + mask_reg_offset, mask);
> +
> +	pci_warn(dev, "%s error(s) masked due to rate-limiting:",
> +		 aer_error_severity_string[info->severity]);
> +	for_each_set_bit(i, &status, 32) {
> +		errmsg = strings[i];
> +		if (!errmsg)
> +			errmsg = "Unknown Error Bit";
> +
> +		pci_warn(dev, "   [%2d] %-22s\n", i, errmsg);
> +	}
> +}
> +
>  static void __print_tlp_header(struct pci_dev *dev, struct pcie_tlp_log *t)
>  {
>  	pci_err(dev, "  TLP Header: %08x %08x %08x %08x\n",
> @@ -713,7 +752,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
>  	int layer, agent;
>  	int id = pci_dev_id(dev);
>  	const char *level;
> -	struct ratelimit_state *ratelimit;
> +	struct ratelimit_state *irq_ratelimit;
> +	struct ratelimit_state *log_ratelimit;
>  
>  	if (!info->status) {
>  		pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
> @@ -722,14 +762,20 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
>  	}
>  
>  	if (info->severity == AER_CORRECTABLE) {
> -		ratelimit = &dev->aer_info->cor_log_ratelimit;
> +		irq_ratelimit = &dev->aer_info->cor_irq_ratelimit;
> +		log_ratelimit = &dev->aer_info->cor_log_ratelimit;
>  		level = KERN_WARNING;
>  	} else {
> -		ratelimit = &dev->aer_info->uncor_log_ratelimit;
> +		irq_ratelimit = &dev->aer_info->uncor_irq_ratelimit;
> +		log_ratelimit = &dev->aer_info->uncor_log_ratelimit;
>  		level = KERN_ERR;
>  	}
>  
> -	if (!__ratelimit(ratelimit))
> +	if (!__ratelimit(irq_ratelimit)) {
> +		mask_reported_error(dev, info);
> +		return;
So if I follow correctly.  We count irqs for any error type and
then mask whatever was set on one that triggered this rate_limit check?
That last one isn't reported other than via a log message.

Imagine that is a totally unrelated error to the earlier ones,
now RASDaemon has no info on it at all as the tracepoint never
fired.  To me that's a very different situation to it knowing there
were 10 errors of the type vs more.

I'd like to see that final trace point and also to see a tracepoint
that lets rasdaemon etc know you cut off errors after this point
+ rasdaemon support for using that.

Terry can address if we need to do anything different for CXL given
he is updating this handling for that.  Superficially I think we
need to driver the masking down into the CXL RAS capability registers.
Internal error in the AER capabilities is far to big a hammer to apply.

> +	}
> +	if (!__ratelimit(log_ratelimit))
>  		return;
>  
>  	layer = AER_GET_LAYER_ERROR(info->severity, info->status);
> @@ -776,14 +822,14 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
>  	int layer, agent, tlp_header_valid = 0;
>  	u32 status, mask;
>  	struct aer_err_info info;
> -	struct ratelimit_state *ratelimit;
> +	struct ratelimit_state *log_ratelimit;
>  
>  	if (aer_severity == AER_CORRECTABLE) {
> -		ratelimit = &dev->aer_info->cor_log_ratelimit;
> +		log_ratelimit = &dev->aer_info->cor_log_ratelimit;
>  		status = aer->cor_status;
>  		mask = aer->cor_mask;
>  	} else {
> -		ratelimit = &dev->aer_info->uncor_log_ratelimit;
> +		log_ratelimit = &dev->aer_info->uncor_log_ratelimit;
>  		status = aer->uncor_status;
>  		mask = aer->uncor_mask;
>  		tlp_header_valid = status & AER_LOG_TLP_MASKS;
> @@ -799,8 +845,8 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
>  	info.first_error = PCI_ERR_CAP_FEP(aer->cap_control);
>  
>  	pci_dev_aer_stats_incr(dev, &info);
> -
> -	if (!__ratelimit(ratelimit))
> +	/* Only ratelimit logs (no IRQ) as AERs reported via GHES/CXL (caller). */
> +	if (!__ratelimit(log_ratelimit))
>  		return;
>  
>  	pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);
Karolina Stolarek Feb. 6, 2025, 1:56 p.m. UTC | #7
On 25/01/2025 08:39, Lukas Wunner wrote:
> 
> Masking errors at the register level feels overzealous,
> in particular because it also disables logging via tracepoints.
> 
> Is there a concrete device that necessitates this change?

I faced issues with excessive Correctable Errors reporting with Samsung 
PM1733 NVMe (a couple of thousand errors per hour), which were still 
polluting the logs even after introducing a ratelimit (first every 2s, 
second ever 30s, as proposed in [1]). Also, instead of masking the 
errors automatically, we could give a user a sysfs knob to turn error 
generation off and on.

> If there is, consider adding a quirk for this particular device
> which masks specific errors, but doesn't affect other devices.

There were many other reports of Correctable Error floods, as signaled 
in the cover letter, so it's hard to pinpoint the specific driver that 
should mask these errors.

All the best,
Karolina

-------------------------------------
[1] - 
https://lore.kernel.org/linux-pci/cover.1736341506.git.karolina.stolarek@oracle.com/

> If there isn't, consider dropping this change until a buggy device
> appears that actually needs it.
> 
> Thanks,
> 
> Lukas
Lukas Wunner Feb. 6, 2025, 8:25 p.m. UTC | #8
On Thu, Feb 06, 2025 at 02:56:20PM +0100, Karolina Stolarek wrote:
> On 25/01/2025 08:39, Lukas Wunner wrote:
> > Masking errors at the register level feels overzealous,
> > in particular because it also disables logging via tracepoints.
> > 
> > Is there a concrete device that necessitates this change?
> 
> I faced issues with excessive Correctable Errors reporting with Samsung
> PM1733 NVMe (a couple of thousand errors per hour), which were still
> polluting the logs even after introducing a ratelimit

I'd suggest to add a "u32 aer_cor_mask" to "struct pci_dev" in the
"#ifdef CONFIG_PCIEAER" section.

Then add a "DECLARE_PCI_FIXUP_HEADER()" macro in drivers/pci/quirks.c
for the Samsung PM1733 which calls a new function which sets exactly the
error bits you're seeing to aer_cor_mask.  This should be #ifdef'ed to
CONFIG_PCIEAER as well.

Finally, amend aer.c to set the bits in aer_cor_mask in the
PCI_ERR_COR_MASK register on probe.


> > If there is, consider adding a quirk for this particular device
> > which masks specific errors, but doesn't affect other devices.
> 
> There were many other reports of Correctable Error floods, as signaled in
> the cover letter, so it's hard to pinpoint the specific driver that should
> mask these errors.

If a specific device frequently signals the same errors,
I think that's a bug of that device and if the vendor doesn't
provide a firmware update, quiescing the errors through a quirk
is a plausible solution.

Of course if this is widespread, it becomes a maintenance nightmare
and then the quirk approach is not a viable option.  I cannot say
whether that's the case.  So far there's a report for one specific
product (the Samsung drive) and hinting that the problem may be
widespread.  It's difficult to make a recommendation without
precise data.

Thanks,

Lukas
diff mbox series

Patch

diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst
index 5546de60f184..d41272504b18 100644
--- a/Documentation/PCI/pcieaer-howto.rst
+++ b/Documentation/PCI/pcieaer-howto.rst
@@ -88,8 +88,8 @@  fields.
 AER Ratelimits
 -------------------------
 
-Error messages are ratelimited per device and error type. This prevents spammy
-devices from flooding the console.
+Errors, both at log and IRQ level, are ratelimited per device and error type.
+This prevents spammy devices from stalling execution.
 
 AER Statistics / Counters
 -------------------------
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 025c50b0f293..1db70ae87f52 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -87,6 +87,8 @@  struct aer_info {
 	u64 rootport_total_nonfatal_errs;
 
 	/* Ratelimits for errors */
+	struct ratelimit_state cor_irq_ratelimit;
+	struct ratelimit_state uncor_irq_ratelimit;
 	struct ratelimit_state cor_log_ratelimit;
 	struct ratelimit_state uncor_log_ratelimit;
 };
@@ -379,6 +381,10 @@  void pci_aer_init(struct pci_dev *dev)
 		return;
 
 	dev->aer_info = kzalloc(sizeof(struct aer_info), GFP_KERNEL);
+	ratelimit_state_init(&dev->aer_info->cor_irq_ratelimit,
+			     DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST*3);
+	ratelimit_state_init(&dev->aer_info->uncor_irq_ratelimit,
+			     DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST*3);
 	ratelimit_state_init(&dev->aer_info->cor_log_ratelimit,
 			     DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
 	ratelimit_state_init(&dev->aer_info->uncor_log_ratelimit,
@@ -676,6 +682,39 @@  static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
 	}
 }
 
+static void mask_reported_error(struct pci_dev *dev, struct aer_err_info *info)
+{
+	const char **strings;
+	const char *errmsg;
+	u16 aer_offset = dev->aer_cap;
+	u16 mask_reg_offset;
+	u32 mask;
+	unsigned long status = info->status;
+	int i;
+
+	if (info->severity == AER_CORRECTABLE) {
+		strings = aer_correctable_error_string;
+		mask_reg_offset = PCI_ERR_COR_MASK;
+	} else {
+		strings = aer_uncorrectable_error_string;
+		mask_reg_offset = PCI_ERR_UNCOR_MASK;
+	}
+
+	pci_read_config_dword(dev, aer_offset + mask_reg_offset, &mask);
+	mask |= status;
+	pci_write_config_dword(dev, aer_offset + mask_reg_offset, mask);
+
+	pci_warn(dev, "%s error(s) masked due to rate-limiting:",
+		 aer_error_severity_string[info->severity]);
+	for_each_set_bit(i, &status, 32) {
+		errmsg = strings[i];
+		if (!errmsg)
+			errmsg = "Unknown Error Bit";
+
+		pci_warn(dev, "   [%2d] %-22s\n", i, errmsg);
+	}
+}
+
 static void __print_tlp_header(struct pci_dev *dev, struct pcie_tlp_log *t)
 {
 	pci_err(dev, "  TLP Header: %08x %08x %08x %08x\n",
@@ -713,7 +752,8 @@  void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 	int layer, agent;
 	int id = pci_dev_id(dev);
 	const char *level;
-	struct ratelimit_state *ratelimit;
+	struct ratelimit_state *irq_ratelimit;
+	struct ratelimit_state *log_ratelimit;
 
 	if (!info->status) {
 		pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
@@ -722,14 +762,20 @@  void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 	}
 
 	if (info->severity == AER_CORRECTABLE) {
-		ratelimit = &dev->aer_info->cor_log_ratelimit;
+		irq_ratelimit = &dev->aer_info->cor_irq_ratelimit;
+		log_ratelimit = &dev->aer_info->cor_log_ratelimit;
 		level = KERN_WARNING;
 	} else {
-		ratelimit = &dev->aer_info->uncor_log_ratelimit;
+		irq_ratelimit = &dev->aer_info->uncor_irq_ratelimit;
+		log_ratelimit = &dev->aer_info->uncor_log_ratelimit;
 		level = KERN_ERR;
 	}
 
-	if (!__ratelimit(ratelimit))
+	if (!__ratelimit(irq_ratelimit)) {
+		mask_reported_error(dev, info);
+		return;
+	}
+	if (!__ratelimit(log_ratelimit))
 		return;
 
 	layer = AER_GET_LAYER_ERROR(info->severity, info->status);
@@ -776,14 +822,14 @@  void pci_print_aer(struct pci_dev *dev, int aer_severity,
 	int layer, agent, tlp_header_valid = 0;
 	u32 status, mask;
 	struct aer_err_info info;
-	struct ratelimit_state *ratelimit;
+	struct ratelimit_state *log_ratelimit;
 
 	if (aer_severity == AER_CORRECTABLE) {
-		ratelimit = &dev->aer_info->cor_log_ratelimit;
+		log_ratelimit = &dev->aer_info->cor_log_ratelimit;
 		status = aer->cor_status;
 		mask = aer->cor_mask;
 	} else {
-		ratelimit = &dev->aer_info->uncor_log_ratelimit;
+		log_ratelimit = &dev->aer_info->uncor_log_ratelimit;
 		status = aer->uncor_status;
 		mask = aer->uncor_mask;
 		tlp_header_valid = status & AER_LOG_TLP_MASKS;
@@ -799,8 +845,8 @@  void pci_print_aer(struct pci_dev *dev, int aer_severity,
 	info.first_error = PCI_ERR_CAP_FEP(aer->cap_control);
 
 	pci_dev_aer_stats_incr(dev, &info);
-
-	if (!__ratelimit(ratelimit))
+	/* Only ratelimit logs (no IRQ) as AERs reported via GHES/CXL (caller). */
+	if (!__ratelimit(log_ratelimit))
 		return;
 
 	pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);