Message ID | 20240222181324.28242-2-fancer.lancer@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | EDAC/mc/synopsys: Various fixes and cleanups | expand |
On Thu, Feb 22, 2024 at 09:12:46PM +0300, Serge Semin wrote: > The race condition around the ECCCLR register access happens in the IRQ > disable method called in the device remove() procedure and in the ECC IRQ > handler: > 1. Enable IRQ: > a. ECCCLR = EN_CE | EN_UE > 2. Disable IRQ: > a. ECCCLR = 0 > 3. IRQ handler: > a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT > b. ECCCLR = 0 > c. ECCCLR = EN_CE | EN_UE > So if the IRQ disabling procedure is called concurrently with the IRQ > handler method the IRQ might be actually left enabled due to the > statement 3c. > > The root cause of the problem is that ECCCLR register (which since v3.10a > has been called as ECCCTL) has intermixed ECC status data clear flags and > the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and > handling (write 1 to clear ECC status data) procedures must be serialised > around the ECCCTL register modification to prevent the race. > > So fix the problem described above by adding the spin-lock around the > ECCCLR modifications and preventing the IRQ-handler from modifying the > IRQs enable flags (there is no point in disabling the IRQ and then > re-enabling it again within a single IRQ handler call, see the statements > 3a/3b and 3c above). So I'm looking at the code and am looking at this and wondering how we even ended up in this mess?! An interrupt handler should not *enable* the interrupt again - that's just crazy. And I should've seen that in 4bcffe941758 ("EDAC/synopsys: Re-enable the error interrupts on v3 hw") and stopped it right there. But well, it is what it is... So I'd like to see the following flow: * on init, the interrupt is enabled with enable_intr() *after* registering the interrupt handler. * on exit, the interrupt is disabled with disable_intr() and then no interrupts are coming in anymore. And then I don't think you'll need the spinlock and it'll be sane design. Right?
On Mon, Apr 15, 2024 at 08:36:16PM +0200, Borislav Petkov wrote: > On Thu, Feb 22, 2024 at 09:12:46PM +0300, Serge Semin wrote: > > The race condition around the ECCCLR register access happens in the IRQ > > disable method called in the device remove() procedure and in the ECC IRQ > > handler: > > 1. Enable IRQ: > > a. ECCCLR = EN_CE | EN_UE > > 2. Disable IRQ: > > a. ECCCLR = 0 > > 3. IRQ handler: > > a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT > > b. ECCCLR = 0 > > c. ECCCLR = EN_CE | EN_UE > > So if the IRQ disabling procedure is called concurrently with the IRQ > > handler method the IRQ might be actually left enabled due to the > > statement 3c. > > > > The root cause of the problem is that ECCCLR register (which since v3.10a > > has been called as ECCCTL) has intermixed ECC status data clear flags and > > the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and > > handling (write 1 to clear ECC status data) procedures must be serialised > > around the ECCCTL register modification to prevent the race. > > > > So fix the problem described above by adding the spin-lock around the > > ECCCLR modifications and preventing the IRQ-handler from modifying the > > IRQs enable flags (there is no point in disabling the IRQ and then > > re-enabling it again within a single IRQ handler call, see the statements > > 3a/3b and 3c above). > > So I'm looking at the code and am looking at this and wondering how we > even ended up in this mess?! > > An interrupt handler should not *enable* the interrupt again - that's > just crazy. And I should've seen that in > > 4bcffe941758 ("EDAC/synopsys: Re-enable the error interrupts on v3 hw") > > and stopped it right there. But well, it is what it is... It looks indeed crazy because the method is called enable_intr() and is called in the IRQ handler. Right, re-enabling the IRQ in the handler doesn't look good. But under the hood it was just a way to fix the problem described in the commit you cited. enable_intr() just gets back the IRQ Enable flags cleared a bit before in the zynqmp_get_error_info() method. The root cause of the problem is that the IRQ status/clear flags: ECCCLR.ecc_corrected_err_clr (R/W1C) ECCCLR.ecc_uncorrected_err_clr (R/W1C) ECCCLR.ecc_corr_err_cnt_clr (R/W1C) ECCCLR.ecc_uncorr_err_cnt_clr (R/W1C) etc and the IRQ enable/disable flags (since v3.10a): ECCCLR.ecc_corrected_err_intr_en (R/W) ECCCLR.ecc_uncorrected_err_intr_en (R/W) reside in a single register - ECCCLR (Synopsys has renamed it to ECCCTL since v3.10a due to adding the IRQ En/Dis flags). Thus any concurrent access to that CSR like "Clear IRQ status/counters" and "IRQ disable/enable" need to be protected from the race condition. > > So I'd like to see the following flow: > > * on init, the interrupt is enabled with enable_intr() *after* > registering the interrupt handler. > > * on exit, the interrupt is disabled with disable_intr() and then no > interrupts are coming in anymore. > > And then I don't think you'll need the spinlock and it'll be sane > design. > > Right? This is what is implemented at the moment and it's racy. IRQ-handler clears the IRQ status/counters by writing 1s to the respective flags in the ECCCLR register. This inevitable causes writing to the IRQ Enable/disable bits. Thus if during the IRQ handling the driver/device gets to be removed (disable_intr() is called), the IRQ might be left enabled despite of having the disable_intr() method executed. In its turn that may cause fatal problems if the IRQ handler is executed once again before the IRQ line is freed and disabled in the GIC side. If we wish to avoid using the atomic spin-lock we'll need to change the order: 0. Enable ECC IRQ in ECCCLR/ECCCTL CSR 1. Request IRQ line and register the IRQ handler ... 2. Free IRQ line 3. Disable ECC IRQ in ECCCLR/ECCCTL CSR But if that path is decided to be taken the next aspects will need to be taken into account: 1. If the IRQ line is shared, then the ECC IRQ might be delivered somewhere between steps 0 and 1, which won't make the IRQ subsystem happy. (Although this isn't actual for the current driver because it requests the non-shared IRQ line for ECC, but who knows how the situation will change in future). 2. A bit later (in the patchset #3) I am adding the ECC Scrubber support, which will need the spin-lock anyway to protect another two CSRs access. These CSRs are touched in run-time to enable/disable the scrubbing and set/get the scrub rate. So since we'll need to have a spin-lock anyway and from the scalability point of view, it sounds reasonable to keep what is suggested in my patch. What do you think? -Serge(y) > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Apr 16, 2024 at 01:06:11PM +0300, Serge Semin wrote: > It looks indeed crazy because the method is called enable_intr() and > is called in the IRQ handler. Right, re-enabling the IRQ in the handler > doesn't look good. But under the hood it was just a way to fix the > problem described in the commit you cited. enable_intr() just gets > back the IRQ Enable flags cleared a bit before in the > zynqmp_get_error_info() method. > > The root cause of the problem is that the IRQ status/clear flags: > ECCCLR.ecc_corrected_err_clr (R/W1C) > ECCCLR.ecc_uncorrected_err_clr (R/W1C) > ECCCLR.ecc_corr_err_cnt_clr (R/W1C) > ECCCLR.ecc_uncorr_err_cnt_clr (R/W1C) > etc > > and the IRQ enable/disable flags (since v3.10a): > ECCCLR.ecc_corrected_err_intr_en (R/W) > ECCCLR.ecc_uncorrected_err_intr_en (R/W) > > reside in a single register - ECCCLR (Synopsys has renamed it to > ECCCTL since v3.10a due to adding the IRQ En/Dis flags). > > Thus any concurrent access to that CSR like "Clear IRQ > status/counters" and "IRQ disable/enable" need to be protected from > the race condition. Ok, let's pick this apart one-by-one. I'll return to the rest you're explaining as needed. So, can writes to the status/counter bits while writing the *same* bit to the IRQ enable/disable bit prevent any race conditions? Meaning, you only change the status and counter bits and you preserve the same value in the IRQ disable/enable bit? IOW, I'm thinking of shadowing that ECCCTL in software so that we update it from the shadowed value. Because, AFAIU, the spinlock won't help if you grab it, clear the status/counter bits and disable the interrupt in the process. You want to only clear the status/counter bits and leave the interrupt enabled. Right? IOW, in one single write you do: ECCCLR.ecc_corrected_err_clr=1 ECCCLR.ecc_uncorrected_err_clr=1 ECCCLR.ecc_corr_err_cnt_clr=1 ECCCLR.ecc_uncorr_err_cnt_clr=1 ECCCLR.ecc_corrected_err_intr_en=1 ECCCLR.ecc_uncorrected_err_intr_en=1 ? Thx.
On Sun, Apr 21, 2024 at 12:07:30PM +0200, Borislav Petkov wrote: > On Tue, Apr 16, 2024 at 01:06:11PM +0300, Serge Semin wrote: > > It looks indeed crazy because the method is called enable_intr() and > > is called in the IRQ handler. Right, re-enabling the IRQ in the handler > > doesn't look good. But under the hood it was just a way to fix the > > problem described in the commit you cited. enable_intr() just gets > > back the IRQ Enable flags cleared a bit before in the > > zynqmp_get_error_info() method. > > > > The root cause of the problem is that the IRQ status/clear flags: > > ECCCLR.ecc_corrected_err_clr (R/W1C) > > ECCCLR.ecc_uncorrected_err_clr (R/W1C) > > ECCCLR.ecc_corr_err_cnt_clr (R/W1C) > > ECCCLR.ecc_uncorr_err_cnt_clr (R/W1C) > > etc > > > > and the IRQ enable/disable flags (since v3.10a): > > ECCCLR.ecc_corrected_err_intr_en (R/W) > > ECCCLR.ecc_uncorrected_err_intr_en (R/W) > > > > reside in a single register - ECCCLR (Synopsys has renamed it to > > ECCCTL since v3.10a due to adding the IRQ En/Dis flags). > > > > Thus any concurrent access to that CSR like "Clear IRQ > > status/counters" and "IRQ disable/enable" need to be protected from > > the race condition. > > Ok, let's pick this apart one-by-one. I'll return to the rest you're > explaining as needed. > > So, can writes to the status/counter bits while writing the *same* bit > to the IRQ enable/disable bit prevent any race conditions? No, because the clear and enable/disable bits belong to the same CSR. While you are writing the clear+same/enable bits, the concurrent IO may have changed the same/enable bits. Like this: IRQ-handler | IRQ-disabler | tmp = clear_sts_bits | enable_irq_bits;| | ECCCLR = 0; // disable IRQ ECCCLR = tmp; | ----------------------------------------+-------------------------------------- As a result even though the IRQ-disabler cleared the IRQ-enable bits, the IRQ-handler got them back to being set. The same will happen if we get to write the *same* bits in the handler: IRQ-handler | IRQ-disabler | tmp = ECCCLR | clear_sts_bits; | | ECCCLR = 0; // disable IRQs ECCCLR = tmp; | ----------------------------------------+-------------------------------------- The last example is almost the same as what happens at the moment and what I am fixing in this patch. The difference is that there is a greater number of ECCCLR CSR changes performed in the IRQ-handler context, which makes the critical section even wider than it could be: IRQ-handler | IRQ-disabler | zynqmp_get_error_info: | ECCCLR = clear_sts_bits; | ECCCLR = 0; // actually redundant | ... | ECCCLR = 0; // disable IRQs enable_intr: | ECCCLR = enable_irq_bits; | ----------------------------------------+-------------------------------------- > > Meaning, you only change the status and counter bits and you preserve > the same value in the IRQ disable/enable bit? AFAICS this won't help to solve the race condition because writing the preserved value of the enable/disable bits is the cause of the race condition. The critical section is in concurrent flushing of different values to the ECCCLR.*en bits. The only ways to solve that are: 1. prevent the concurrent access 2. serialize the critical section > > IOW, I'm thinking of shadowing that ECCCTL in software so that we update > it from the shadowed value. I don't see the shadowing will help to prevent what is happening unless you know some shadow-register pattern I am not aware of. AFAIR the shadow register is normally utilized for the cases of: 1. read ops returns an incorrect value or a CSR couldn't be read 2. IO bus is too slow in order to speed-up the RMW-pattern In any case the shadowed value and the process of the data flushing would need to be protected with a lock anyway in order to sync the shadow register content and the actual value written to the CSR. > > Because, AFAIU, the spinlock won't help if you grab it, clear the > status/counter bits and disable the interrupt in the process. You want > to only clear the status/counter bits and leave the interrupt enabled. > > Right? Right, but the spinlock will help. What I need to do deal with two concurrent operations: IRQ-handler: clear the status/counter bits and leave the IRQ enable bits as is. IRQ-disabler: clear the IRQ enable bits These actions need to be serialized in order to prevent the race condition. > > IOW, in one single write you do: > > ECCCLR.ecc_corrected_err_clr=1 > ECCCLR.ecc_uncorrected_err_clr=1 > ECCCLR.ecc_corr_err_cnt_clr=1 > ECCCLR.ecc_uncorr_err_cnt_clr=1 > ECCCLR.ecc_corrected_err_intr_en=1 > ECCCLR.ecc_uncorrected_err_intr_en=1 > > ? This won't be help because the concurrent IRQ-disabler could have already cleared the IRQ enable bits while the IRQ-handler is being executed and about to write to the ECCCLR register. Like this: IRQ-handler | IRQ-disabler | tmp = clear_sts_bits | enable_irq_bits;| | ECCCLR = 0; // disable IRQ ECCCLR = tmp; | ----------------------------------------+-------------------------------------- Even if we get to add the spin-lock serializing the ECCCLR writes it won't solve the problem since the IRQ-disabler critical section could be executed a bit before the IRQ-handler critical section so the later one will just re-enable the IRQs disabled by the former one. Here is what is suggested in my patch to fix the problem: IRQ-handler | IRQ-disabler | zynqmp_get_error_info: | | lock_irqsave | ECCCLR = 0; // disable IRQs | unlock_irqrestore lock_irqsave; | tmp = ECCCLR | clear_sts_bits; | ECCCLR = tmp; | unlock_irqrestore; | ----------------------------------------+-------------------------------------- See, the IRQ-status/counters clearing and IRQ disabling processes are serialized so the former one wouldn't override the values written by the later one. Here is the way it would have looked in case of the shadow-register implementation: IRQ-handler | IRQ-disabler | zynqmp_get_error_info: | | lock_irqsave | shadow_en_bits = 0; | ECCCLR = shadow_en_bits; // disable IRQs | unlock_irqrestore lock_irqsave; | tmp = clear_sts_bits | shadow_en_bits; | ECCCLR = tmp; | unlock_irqrestore; | ----------------------------------------+-------------------------------------- The shadow-register pattern just prevents one ECCCLR read op. The shadowed data sync would have needed the serialization anyway. Seeing the DW DDR uMCTL2 controller CSRs are always memory mapped, I don't see using the shadow-register CSR would worth being implemented unless you meant something different. -Serge(y) > > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Apr 25, 2024 at 03:52:38PM +0300, Serge Semin wrote: > Even if we get to add the spin-lock serializing the ECCCLR writes it > won't solve the problem since the IRQ-disabler critical section could > be executed a bit before the IRQ-handler critical section so the later > one will just re-enable the IRQs disabled by the former one. > > Here is what is suggested in my patch to fix the problem: > > IRQ-handler | IRQ-disabler > | > zynqmp_get_error_info: | > | lock_irqsave > | ECCCLR = 0; // disable IRQs > | unlock_irqrestore > lock_irqsave; | > tmp = ECCCLR | clear_sts_bits; | > ECCCLR = tmp; | > unlock_irqrestore; | <--- I'm presuming here the IRQ-disabler will reenable interrupts at some point? Otherwise we have the same problem as before when interrupts remain off after the IRQ handler has run. Other than that, yes, I see it, we will need the locking. Thanks for elaborating.
On Mon, May 06, 2024 at 12:20:29PM +0200, Borislav Petkov wrote: > On Thu, Apr 25, 2024 at 03:52:38PM +0300, Serge Semin wrote: > > Even if we get to add the spin-lock serializing the ECCCLR writes it > > won't solve the problem since the IRQ-disabler critical section could > > be executed a bit before the IRQ-handler critical section so the later > > one will just re-enable the IRQs disabled by the former one. > > > > Here is what is suggested in my patch to fix the problem: > > > > IRQ-handler | IRQ-disabler > > | > > zynqmp_get_error_info: | > > | lock_irqsave > > | ECCCLR = 0; // disable IRQs > > | unlock_irqrestore > > lock_irqsave; | > > tmp = ECCCLR | clear_sts_bits; | > > ECCCLR = tmp; | > > unlock_irqrestore; | > > <--- I'm presuming here the IRQ-disabler will reenable interrupts at > some point? > > Otherwise we have the same problem as before when interrupts remain off > after the IRQ handler has run. In the sketch above the IRQ-disabler is the method which disables the IRQ in the concurrent manner. After my patch is applied the IRQ-handler will no longer touch the IRQ enable/disable bits, but will preserve them as is: - clearval = ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT; - clearval |= ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT; + spin_lock_irqsave(&priv->reglock, flags); + + clearval = readl(base + ECC_CLR_OFST) | + ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT | + ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT; writel(clearval, base + ECC_CLR_OFST); - writel(0x0, base + ECC_CLR_OFST); + + spin_unlock_irqrestore(&priv->reglock, flags); Thus there won't be need in the IRQs re-enabling later in the handler: @@ -576,8 +601,6 @@ static irqreturn_t intr_handler(int irq, void *dev_id) /* v3.0 of the controller does not have this register */ if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)) writel(regval, priv->baseaddr + DDR_QOS_IRQ_STAT_OFST); - else - enable_intr(priv); So the only IRQ-disabler left in the driver - disable_intr() - will be called from the device/driver remove() function. The ECCCLR CSR access will be guarded with the spin-lock in the IRQ-disabler and in the IRQ-handler. So it will be safe to have them executed concurrently. > > Other than that, yes, I see it, we will need the locking. > > Thanks for elaborating. Always welcome. Glad we've settled this.) -Serge(y) > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Mon, May 06, 2024 at 02:27:50PM +0300, Serge Semin wrote:
> Always welcome. Glad we've settled this.)
Yap, it looks good so far.
Lemme queue it into urgent and send it Linuswards soon-ish.
Thx.
diff --git a/drivers/edac/synopsys_edac.c b/drivers/edac/synopsys_edac.c index 709babce43ba..0168b05e3ca1 100644 --- a/drivers/edac/synopsys_edac.c +++ b/drivers/edac/synopsys_edac.c @@ -9,6 +9,7 @@ #include <linux/edac.h> #include <linux/module.h> #include <linux/platform_device.h> +#include <linux/spinlock.h> #include <linux/interrupt.h> #include <linux/of.h> @@ -299,6 +300,7 @@ struct synps_ecc_status { /** * struct synps_edac_priv - DDR memory controller private instance data. * @baseaddr: Base address of the DDR controller. + * @reglock: Concurrent CSRs access lock. * @message: Buffer for framing the event specific info. * @stat: ECC status information. * @p_data: Platform data. @@ -313,6 +315,7 @@ struct synps_ecc_status { */ struct synps_edac_priv { void __iomem *baseaddr; + spinlock_t reglock; char message[SYNPS_EDAC_MSG_SIZE]; struct synps_ecc_status stat; const struct synps_platform_data *p_data; @@ -408,7 +411,8 @@ static int zynq_get_error_info(struct synps_edac_priv *priv) static int zynqmp_get_error_info(struct synps_edac_priv *priv) { struct synps_ecc_status *p; - u32 regval, clearval = 0; + u32 regval, clearval; + unsigned long flags; void __iomem *base; base = priv->baseaddr; @@ -452,10 +456,14 @@ static int zynqmp_get_error_info(struct synps_edac_priv *priv) p->ueinfo.blknr = (regval & ECC_CEADDR1_BLKNR_MASK); p->ueinfo.data = readl(base + ECC_UESYND0_OFST); out: - clearval = ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT; - clearval |= ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT; + spin_lock_irqsave(&priv->reglock, flags); + + clearval = readl(base + ECC_CLR_OFST) | + ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT | + ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT; writel(clearval, base + ECC_CLR_OFST); - writel(0x0, base + ECC_CLR_OFST); + + spin_unlock_irqrestore(&priv->reglock, flags); return 0; } @@ -515,24 +523,41 @@ static void handle_error(struct mem_ctl_info *mci, struct synps_ecc_status *p) static void enable_intr(struct synps_edac_priv *priv) { + unsigned long flags; + /* Enable UE/CE Interrupts */ - if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) - writel(DDR_UE_MASK | DDR_CE_MASK, - priv->baseaddr + ECC_CLR_OFST); - else + if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)) { writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, priv->baseaddr + DDR_QOS_IRQ_EN_OFST); + return; + } + + spin_lock_irqsave(&priv->reglock, flags); + + writel(DDR_UE_MASK | DDR_CE_MASK, + priv->baseaddr + ECC_CLR_OFST); + + spin_unlock_irqrestore(&priv->reglock, flags); } static void disable_intr(struct synps_edac_priv *priv) { + unsigned long flags; + /* Disable UE/CE Interrupts */ - if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR) - writel(0x0, priv->baseaddr + ECC_CLR_OFST); - else + if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)) { writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK, priv->baseaddr + DDR_QOS_IRQ_DB_OFST); + + return; + } + + spin_lock_irqsave(&priv->reglock, flags); + + writel(0, priv->baseaddr + ECC_CLR_OFST); + + spin_unlock_irqrestore(&priv->reglock, flags); } /** @@ -576,8 +601,6 @@ static irqreturn_t intr_handler(int irq, void *dev_id) /* v3.0 of the controller does not have this register */ if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)) writel(regval, priv->baseaddr + DDR_QOS_IRQ_STAT_OFST); - else - enable_intr(priv); return IRQ_HANDLED; } @@ -1359,6 +1382,7 @@ static int mc_probe(struct platform_device *pdev) priv = mci->pvt_info; priv->baseaddr = baseaddr; priv->p_data = p_data; + spin_lock_init(&priv->reglock); mc_init(mci, pdev);
The race condition around the ECCCLR register access happens in the IRQ disable method called in the device remove() procedure and in the ECC IRQ handler: 1. Enable IRQ: a. ECCCLR = EN_CE | EN_UE 2. Disable IRQ: a. ECCCLR = 0 3. IRQ handler: a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT b. ECCCLR = 0 c. ECCCLR = EN_CE | EN_UE So if the IRQ disabling procedure is called concurrently with the IRQ handler method the IRQ might be actually left enabled due to the statement 3c. The root cause of the problem is that ECCCLR register (which since v3.10a has been called as ECCCTL) has intermixed ECC status data clear flags and the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and handling (write 1 to clear ECC status data) procedures must be serialised around the ECCCTL register modification to prevent the race. So fix the problem described above by adding the spin-lock around the ECCCLR modifications and preventing the IRQ-handler from modifying the IRQs enable flags (there is no point in disabling the IRQ and then re-enabling it again within a single IRQ handler call, see the statements 3a/3b and 3c above). Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Serge Semin <fancer.lancer@gmail.com> --- Cc: Sherry Sun <sherry.sun@nxp.com> Changelog v4: - This is a new patch detached from [PATCH v3 01/17] EDAC/synopsys: Fix native uMCTL2 IRQs handling procedure - Rename lock to reglock (Borislav) --- drivers/edac/synopsys_edac.c | 50 ++++++++++++++++++++++++++---------- 1 file changed, 37 insertions(+), 13 deletions(-)