diff mbox series

[v5,01/20] EDAC/synopsys: Fix ECC status data and IRQ disable race condition

Message ID 20240222181324.28242-2-fancer.lancer@gmail.com (mailing list archive)
State New, archived
Headers show
Series EDAC/mc/synopsys: Various fixes and cleanups | expand

Commit Message

Serge Semin Feb. 22, 2024, 6:12 p.m. UTC
The race condition around the ECCCLR register access happens in the IRQ
disable method called in the device remove() procedure and in the ECC IRQ
handler:
1. Enable IRQ:
   a. ECCCLR = EN_CE | EN_UE
2. Disable IRQ:
   a. ECCCLR = 0
3. IRQ handler:
   a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT
   b. ECCCLR = 0
   c. ECCCLR = EN_CE | EN_UE
So if the IRQ disabling procedure is called concurrently with the IRQ
handler method the IRQ might be actually left enabled due to the
statement 3c.

The root cause of the problem is that ECCCLR register (which since v3.10a
has been called as ECCCTL) has intermixed ECC status data clear flags and
the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and
handling (write 1 to clear ECC status data) procedures must be serialised
around the ECCCTL register modification to prevent the race.

So fix the problem described above by adding the spin-lock around the
ECCCLR modifications and preventing the IRQ-handler from modifying the
IRQs enable flags (there is no point in disabling the IRQ and then
re-enabling it again within a single IRQ handler call, see the statements
3a/3b and 3c above).

Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>

---

Cc: Sherry Sun <sherry.sun@nxp.com>

Changelog v4:
- This is a new patch detached from
  [PATCH v3 01/17] EDAC/synopsys: Fix native uMCTL2 IRQs handling procedure
- Rename lock to reglock (Borislav)
---
 drivers/edac/synopsys_edac.c | 50 ++++++++++++++++++++++++++----------
 1 file changed, 37 insertions(+), 13 deletions(-)

Comments

Borislav Petkov April 15, 2024, 6:36 p.m. UTC | #1
On Thu, Feb 22, 2024 at 09:12:46PM +0300, Serge Semin wrote:
> The race condition around the ECCCLR register access happens in the IRQ
> disable method called in the device remove() procedure and in the ECC IRQ
> handler:
> 1. Enable IRQ:
>    a. ECCCLR = EN_CE | EN_UE
> 2. Disable IRQ:
>    a. ECCCLR = 0
> 3. IRQ handler:
>    a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT
>    b. ECCCLR = 0
>    c. ECCCLR = EN_CE | EN_UE
> So if the IRQ disabling procedure is called concurrently with the IRQ
> handler method the IRQ might be actually left enabled due to the
> statement 3c.
> 
> The root cause of the problem is that ECCCLR register (which since v3.10a
> has been called as ECCCTL) has intermixed ECC status data clear flags and
> the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and
> handling (write 1 to clear ECC status data) procedures must be serialised
> around the ECCCTL register modification to prevent the race.
> 
> So fix the problem described above by adding the spin-lock around the
> ECCCLR modifications and preventing the IRQ-handler from modifying the
> IRQs enable flags (there is no point in disabling the IRQ and then
> re-enabling it again within a single IRQ handler call, see the statements
> 3a/3b and 3c above).

So I'm looking at the code and am looking at this and wondering how we
even ended up in this mess?!

An interrupt handler should not *enable* the interrupt again - that's
just crazy. And I should've seen that in

  4bcffe941758 ("EDAC/synopsys: Re-enable the error interrupts on v3 hw")

and stopped it right there. But well, it is what it is...

So I'd like to see the following flow:

* on init, the interrupt is enabled with enable_intr() *after*
registering the interrupt handler.

* on exit, the interrupt is disabled with disable_intr() and then no
interrupts are coming in anymore.

And then I don't think you'll need the spinlock and it'll be sane
design.

Right?
Serge Semin April 16, 2024, 10:06 a.m. UTC | #2
On Mon, Apr 15, 2024 at 08:36:16PM +0200, Borislav Petkov wrote:
> On Thu, Feb 22, 2024 at 09:12:46PM +0300, Serge Semin wrote:
> > The race condition around the ECCCLR register access happens in the IRQ
> > disable method called in the device remove() procedure and in the ECC IRQ
> > handler:
> > 1. Enable IRQ:
> >    a. ECCCLR = EN_CE | EN_UE
> > 2. Disable IRQ:
> >    a. ECCCLR = 0
> > 3. IRQ handler:
> >    a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT
> >    b. ECCCLR = 0
> >    c. ECCCLR = EN_CE | EN_UE
> > So if the IRQ disabling procedure is called concurrently with the IRQ
> > handler method the IRQ might be actually left enabled due to the
> > statement 3c.
> > 
> > The root cause of the problem is that ECCCLR register (which since v3.10a
> > has been called as ECCCTL) has intermixed ECC status data clear flags and
> > the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and
> > handling (write 1 to clear ECC status data) procedures must be serialised
> > around the ECCCTL register modification to prevent the race.
> > 
> > So fix the problem described above by adding the spin-lock around the
> > ECCCLR modifications and preventing the IRQ-handler from modifying the
> > IRQs enable flags (there is no point in disabling the IRQ and then
> > re-enabling it again within a single IRQ handler call, see the statements
> > 3a/3b and 3c above).
> 

> So I'm looking at the code and am looking at this and wondering how we
> even ended up in this mess?!
> 
> An interrupt handler should not *enable* the interrupt again - that's
> just crazy. And I should've seen that in
> 
>   4bcffe941758 ("EDAC/synopsys: Re-enable the error interrupts on v3 hw")
> 
> and stopped it right there. But well, it is what it is...

It looks indeed crazy because the method is called enable_intr() and
is called in the IRQ handler. Right, re-enabling the IRQ in the handler
doesn't look good. But under the hood it was just a way to fix the
problem described in the commit you cited. enable_intr() just gets
back the IRQ Enable flags cleared a bit before in the
zynqmp_get_error_info() method.

The root cause of the problem is that the IRQ status/clear flags:
ECCCLR.ecc_corrected_err_clr	(R/W1C)
ECCCLR.ecc_uncorrected_err_clr	(R/W1C)
ECCCLR.ecc_corr_err_cnt_clr	(R/W1C)
ECCCLR.ecc_uncorr_err_cnt_clr	(R/W1C)
etc

and the IRQ enable/disable flags (since v3.10a):
ECCCLR.ecc_corrected_err_intr_en	(R/W)
ECCCLR.ecc_uncorrected_err_intr_en	(R/W)

reside in a single register - ECCCLR (Synopsys has renamed it to
ECCCTL since v3.10a due to adding the IRQ En/Dis flags).

Thus any concurrent access to that CSR like "Clear IRQ
status/counters" and "IRQ disable/enable" need to be protected from
the race condition.

> 
> So I'd like to see the following flow:
> 
> * on init, the interrupt is enabled with enable_intr() *after*
> registering the interrupt handler.
> 
> * on exit, the interrupt is disabled with disable_intr() and then no
> interrupts are coming in anymore.
> 
> And then I don't think you'll need the spinlock and it'll be sane
> design.
> 
> Right?

This is what is implemented at the moment and it's racy. IRQ-handler
clears the IRQ status/counters by writing 1s to the respective flags
in the ECCCLR register. This inevitable causes writing to the IRQ
Enable/disable bits. Thus if during the IRQ handling the driver/device
gets to be removed (disable_intr() is called), the IRQ might be left
enabled despite of having the disable_intr() method executed. In its
turn that may cause fatal problems if the IRQ handler is executed once
again before the IRQ line is freed and disabled in the GIC side.

If we wish to avoid using the atomic spin-lock we'll need to change
the order:
0. Enable ECC IRQ in ECCCLR/ECCCTL CSR
1. Request IRQ line and register the IRQ handler
...
2. Free IRQ line
3. Disable ECC IRQ in ECCCLR/ECCCTL CSR

But if that path is decided to be taken the next aspects will need to
be taken into account:

1. If the IRQ line is shared, then the ECC IRQ might be delivered
somewhere between steps 0 and 1, which won't make the IRQ subsystem
happy. (Although this isn't actual for the current driver because it
requests the non-shared IRQ line for ECC, but who knows how the
situation will change in future).

2. A bit later (in the patchset #3) I am adding the ECC Scrubber
support, which will need the spin-lock anyway to protect another two
CSRs access. These CSRs are touched in run-time to enable/disable the
scrubbing and set/get the scrub rate.

So since we'll need to have a spin-lock anyway and from the
scalability point of view, it sounds reasonable to keep what is
suggested in my patch. What do you think?

-Serge(y)

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov April 21, 2024, 10:07 a.m. UTC | #3
On Tue, Apr 16, 2024 at 01:06:11PM +0300, Serge Semin wrote:
> It looks indeed crazy because the method is called enable_intr() and
> is called in the IRQ handler. Right, re-enabling the IRQ in the handler
> doesn't look good. But under the hood it was just a way to fix the
> problem described in the commit you cited. enable_intr() just gets
> back the IRQ Enable flags cleared a bit before in the
> zynqmp_get_error_info() method.
> 
> The root cause of the problem is that the IRQ status/clear flags:
> ECCCLR.ecc_corrected_err_clr	(R/W1C)
> ECCCLR.ecc_uncorrected_err_clr	(R/W1C)
> ECCCLR.ecc_corr_err_cnt_clr	(R/W1C)
> ECCCLR.ecc_uncorr_err_cnt_clr	(R/W1C)
> etc
> 
> and the IRQ enable/disable flags (since v3.10a):
> ECCCLR.ecc_corrected_err_intr_en	(R/W)
> ECCCLR.ecc_uncorrected_err_intr_en	(R/W)
> 
> reside in a single register - ECCCLR (Synopsys has renamed it to
> ECCCTL since v3.10a due to adding the IRQ En/Dis flags).
> 
> Thus any concurrent access to that CSR like "Clear IRQ
> status/counters" and "IRQ disable/enable" need to be protected from
> the race condition.

Ok, let's pick this apart one-by-one. I'll return to the rest you're
explaining as needed.

So, can writes to the status/counter bits while writing the *same* bit
to the IRQ enable/disable bit prevent any race conditions?

Meaning, you only change the status and counter bits and you preserve
the same value in the IRQ disable/enable bit?

IOW, I'm thinking of shadowing that ECCCTL in software so that we update
it from the shadowed value.

Because, AFAIU, the spinlock won't help if you grab it, clear the
status/counter bits and disable the interrupt in the process. You want
to only clear the status/counter bits and leave the interrupt enabled.

Right?

IOW, in one single write you do:

ECCCLR.ecc_corrected_err_clr=1
ECCCLR.ecc_uncorrected_err_clr=1
ECCCLR.ecc_corr_err_cnt_clr=1
ECCCLR.ecc_uncorr_err_cnt_clr=1
ECCCLR.ecc_corrected_err_intr_en=1
ECCCLR.ecc_uncorrected_err_intr_en=1

?

Thx.
Serge Semin April 25, 2024, 12:52 p.m. UTC | #4
On Sun, Apr 21, 2024 at 12:07:30PM +0200, Borislav Petkov wrote:
> On Tue, Apr 16, 2024 at 01:06:11PM +0300, Serge Semin wrote:
> > It looks indeed crazy because the method is called enable_intr() and
> > is called in the IRQ handler. Right, re-enabling the IRQ in the handler
> > doesn't look good. But under the hood it was just a way to fix the
> > problem described in the commit you cited. enable_intr() just gets
> > back the IRQ Enable flags cleared a bit before in the
> > zynqmp_get_error_info() method.
> > 
> > The root cause of the problem is that the IRQ status/clear flags:
> > ECCCLR.ecc_corrected_err_clr	(R/W1C)
> > ECCCLR.ecc_uncorrected_err_clr	(R/W1C)
> > ECCCLR.ecc_corr_err_cnt_clr	(R/W1C)
> > ECCCLR.ecc_uncorr_err_cnt_clr	(R/W1C)
> > etc
> > 
> > and the IRQ enable/disable flags (since v3.10a):
> > ECCCLR.ecc_corrected_err_intr_en	(R/W)
> > ECCCLR.ecc_uncorrected_err_intr_en	(R/W)
> > 
> > reside in a single register - ECCCLR (Synopsys has renamed it to
> > ECCCTL since v3.10a due to adding the IRQ En/Dis flags).
> > 
> > Thus any concurrent access to that CSR like "Clear IRQ
> > status/counters" and "IRQ disable/enable" need to be protected from
> > the race condition.
> 
> Ok, let's pick this apart one-by-one. I'll return to the rest you're
> explaining as needed.
> 
> So, can writes to the status/counter bits while writing the *same* bit
> to the IRQ enable/disable bit prevent any race conditions?

No, because the clear and enable/disable bits belong to the same CSR.
While you are writing the clear+same/enable bits, the concurrent IO
may have changed the same/enable bits. Like this:

     IRQ-handler                        |    IRQ-disabler
                                        |
 tmp = clear_sts_bits | enable_irq_bits;|
                                        | ECCCLR = 0; // disable IRQ
 ECCCLR = tmp;                          |
----------------------------------------+--------------------------------------

As a result even though the IRQ-disabler cleared the IRQ-enable bits,
the IRQ-handler got them back to being set. The same will happen if we
get to write the *same* bits in the handler:

     IRQ-handler                        |    IRQ-disabler
                                        |
 tmp = ECCCLR | clear_sts_bits;         |
                                        | ECCCLR = 0; // disable IRQs
 ECCCLR = tmp;                          |
----------------------------------------+--------------------------------------

The last example is almost the same as what happens at the moment and
what I am fixing in this patch. The difference is that there is a
greater number of ECCCLR CSR changes performed in the IRQ-handler
context, which makes the critical section even wider than it could be:

     IRQ-handler                        |    IRQ-disabler
                                        |
zynqmp_get_error_info:                  |
 ECCCLR = clear_sts_bits;               |
 ECCCLR = 0; // actually redundant      |
...                                     | ECCCLR = 0; // disable IRQs
enable_intr:                            |
 ECCCLR = enable_irq_bits;              |
----------------------------------------+--------------------------------------

> 
> Meaning, you only change the status and counter bits and you preserve
> the same value in the IRQ disable/enable bit?

AFAICS this won't help to solve the race condition because writing the
preserved value of the enable/disable bits is the cause of the race
condition. The critical section is in concurrent flushing of different
values to the ECCCLR.*en bits. The only ways to solve that are:
1. prevent the concurrent access
2. serialize the critical section

> 
> IOW, I'm thinking of shadowing that ECCCTL in software so that we update
> it from the shadowed value.

I don't see the shadowing will help to prevent what is happening
unless you know some shadow-register pattern I am not aware of. AFAIR
the shadow register is normally utilized for the cases of:
1. read ops returns an incorrect value or a CSR couldn't be read
2. IO bus is too slow in order to speed-up the RMW-pattern
In any case the shadowed value and the process of the data flushing
would need to be protected with a lock anyway in order to sync the
shadow register content and the actual value written to the CSR.

> 
> Because, AFAIU, the spinlock won't help if you grab it, clear the
> status/counter bits and disable the interrupt in the process. You want
> to only clear the status/counter bits and leave the interrupt enabled.
> 
> Right?

Right, but the spinlock will help. What I need to do deal with two
concurrent operations:
IRQ-handler:  clear the status/counter bits and leave the IRQ enable
              bits as is.
IRQ-disabler: clear the IRQ enable bits
These actions need to be serialized in order to prevent the race
condition.

> 
> IOW, in one single write you do:
> 
> ECCCLR.ecc_corrected_err_clr=1
> ECCCLR.ecc_uncorrected_err_clr=1
> ECCCLR.ecc_corr_err_cnt_clr=1
> ECCCLR.ecc_uncorr_err_cnt_clr=1
> ECCCLR.ecc_corrected_err_intr_en=1
> ECCCLR.ecc_uncorrected_err_intr_en=1
> 
> ?

This won't be help because the concurrent IRQ-disabler could have
already cleared the IRQ enable bits while the IRQ-handler is being
executed and about to write to the ECCCLR register. Like this:

     IRQ-handler                        |    IRQ-disabler
                                        |
 tmp = clear_sts_bits | enable_irq_bits;|
                                        | ECCCLR = 0; // disable IRQ
 ECCCLR = tmp;                          |
----------------------------------------+--------------------------------------

Even if we get to add the spin-lock serializing the ECCCLR writes it
won't solve the problem since the IRQ-disabler critical section could
be executed a bit before the IRQ-handler critical section so the later
one will just re-enable the IRQs disabled by the former one.

Here is what is suggested in my patch to fix the problem:

     IRQ-handler                        |    IRQ-disabler
                                        |
zynqmp_get_error_info:                  |
                                        | lock_irqsave
                                        | ECCCLR = 0; // disable IRQs
                                        | unlock_irqrestore
 lock_irqsave;                          |
 tmp = ECCCLR | clear_sts_bits;         |
 ECCCLR = tmp;                          |
 unlock_irqrestore;                     |
----------------------------------------+--------------------------------------

See, the IRQ-status/counters clearing and IRQ disabling processes are
serialized so the former one wouldn't override the values written by
the later one.

Here is the way it would have looked in case of the shadow-register
implementation:

     IRQ-handler                        |    IRQ-disabler
                                        |
zynqmp_get_error_info:                  |
                                        | lock_irqsave
                                        | shadow_en_bits = 0;
                                        | ECCCLR = shadow_en_bits; // disable IRQs
                                        | unlock_irqrestore
 lock_irqsave;                          |
 tmp = clear_sts_bits | shadow_en_bits; |
 ECCCLR = tmp;                          |
 unlock_irqrestore;                     |
----------------------------------------+--------------------------------------

The shadow-register pattern just prevents one ECCCLR read op. The
shadowed data sync would have needed the serialization anyway. Seeing
the DW DDR uMCTL2 controller CSRs are always memory mapped, I don't
see using the shadow-register CSR would worth being implemented unless
you meant something different.

-Serge(y)

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov May 6, 2024, 10:20 a.m. UTC | #5
On Thu, Apr 25, 2024 at 03:52:38PM +0300, Serge Semin wrote:
> Even if we get to add the spin-lock serializing the ECCCLR writes it
> won't solve the problem since the IRQ-disabler critical section could
> be executed a bit before the IRQ-handler critical section so the later
> one will just re-enable the IRQs disabled by the former one.
> 
> Here is what is suggested in my patch to fix the problem:
> 
>      IRQ-handler                        |    IRQ-disabler
>                                         |
> zynqmp_get_error_info:                  |
>                                         | lock_irqsave
>                                         | ECCCLR = 0; // disable IRQs
>                                         | unlock_irqrestore
>  lock_irqsave;                          |
>  tmp = ECCCLR | clear_sts_bits;         |
>  ECCCLR = tmp;                          |
>  unlock_irqrestore;                     |

<--- I'm presuming here the IRQ-disabler will reenable interrupts at
some point?

Otherwise we have the same problem as before when interrupts remain off
after the IRQ handler has run.

Other than that, yes, I see it, we will need the locking.

Thanks for elaborating.
Serge Semin May 6, 2024, 11:27 a.m. UTC | #6
On Mon, May 06, 2024 at 12:20:29PM +0200, Borislav Petkov wrote:
> On Thu, Apr 25, 2024 at 03:52:38PM +0300, Serge Semin wrote:
> > Even if we get to add the spin-lock serializing the ECCCLR writes it
> > won't solve the problem since the IRQ-disabler critical section could
> > be executed a bit before the IRQ-handler critical section so the later
> > one will just re-enable the IRQs disabled by the former one.
> > 
> > Here is what is suggested in my patch to fix the problem:
> > 
> >      IRQ-handler                        |    IRQ-disabler
> >                                         |
> > zynqmp_get_error_info:                  |
> >                                         | lock_irqsave
> >                                         | ECCCLR = 0; // disable IRQs
> >                                         | unlock_irqrestore
> >  lock_irqsave;                          |
> >  tmp = ECCCLR | clear_sts_bits;         |
> >  ECCCLR = tmp;                          |
> >  unlock_irqrestore;                     |
> 

> <--- I'm presuming here the IRQ-disabler will reenable interrupts at
> some point?
> 
> Otherwise we have the same problem as before when interrupts remain off
> after the IRQ handler has run.

In the sketch above the IRQ-disabler is the method which disables the
IRQ in the concurrent manner. After my patch is applied the
IRQ-handler will no longer touch the IRQ enable/disable bits, but will
preserve them as is:
-       clearval = ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT;
-       clearval |= ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT;
+       spin_lock_irqsave(&priv->reglock, flags);
+
+       clearval = readl(base + ECC_CLR_OFST) |
+                  ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT |
+                  ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT;
        writel(clearval, base + ECC_CLR_OFST);
-       writel(0x0, base + ECC_CLR_OFST);
+
+       spin_unlock_irqrestore(&priv->reglock, flags);

Thus there won't be need in the IRQs re-enabling later in the handler:

@@ -576,8 +601,6 @@ static irqreturn_t intr_handler(int irq, void *dev_id)
        /* v3.0 of the controller does not have this register */
        if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR))
                writel(regval, priv->baseaddr + DDR_QOS_IRQ_STAT_OFST);
-       else
-               enable_intr(priv);

So the only IRQ-disabler left in the driver - disable_intr() - will be
called from the device/driver remove() function. The ECCCLR CSR access
will be guarded with the spin-lock in the IRQ-disabler and in the
IRQ-handler. So it will be safe to have them executed concurrently.

> 
> Other than that, yes, I see it, we will need the locking.
> 
> Thanks for elaborating.

Always welcome. Glad we've settled this.)

-Serge(y)

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov May 6, 2024, 12:12 p.m. UTC | #7
On Mon, May 06, 2024 at 02:27:50PM +0300, Serge Semin wrote:
> Always welcome. Glad we've settled this.)

Yap, it looks good so far.

Lemme queue it into urgent and send it Linuswards soon-ish.

Thx.
diff mbox series

Patch

diff --git a/drivers/edac/synopsys_edac.c b/drivers/edac/synopsys_edac.c
index 709babce43ba..0168b05e3ca1 100644
--- a/drivers/edac/synopsys_edac.c
+++ b/drivers/edac/synopsys_edac.c
@@ -9,6 +9,7 @@ 
 #include <linux/edac.h>
 #include <linux/module.h>
 #include <linux/platform_device.h>
+#include <linux/spinlock.h>
 #include <linux/interrupt.h>
 #include <linux/of.h>
 
@@ -299,6 +300,7 @@  struct synps_ecc_status {
 /**
  * struct synps_edac_priv - DDR memory controller private instance data.
  * @baseaddr:		Base address of the DDR controller.
+ * @reglock:		Concurrent CSRs access lock.
  * @message:		Buffer for framing the event specific info.
  * @stat:		ECC status information.
  * @p_data:		Platform data.
@@ -313,6 +315,7 @@  struct synps_ecc_status {
  */
 struct synps_edac_priv {
 	void __iomem *baseaddr;
+	spinlock_t reglock;
 	char message[SYNPS_EDAC_MSG_SIZE];
 	struct synps_ecc_status stat;
 	const struct synps_platform_data *p_data;
@@ -408,7 +411,8 @@  static int zynq_get_error_info(struct synps_edac_priv *priv)
 static int zynqmp_get_error_info(struct synps_edac_priv *priv)
 {
 	struct synps_ecc_status *p;
-	u32 regval, clearval = 0;
+	u32 regval, clearval;
+	unsigned long flags;
 	void __iomem *base;
 
 	base = priv->baseaddr;
@@ -452,10 +456,14 @@  static int zynqmp_get_error_info(struct synps_edac_priv *priv)
 	p->ueinfo.blknr = (regval & ECC_CEADDR1_BLKNR_MASK);
 	p->ueinfo.data = readl(base + ECC_UESYND0_OFST);
 out:
-	clearval = ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT;
-	clearval |= ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT;
+	spin_lock_irqsave(&priv->reglock, flags);
+
+	clearval = readl(base + ECC_CLR_OFST) |
+		   ECC_CTRL_CLR_CE_ERR | ECC_CTRL_CLR_CE_ERRCNT |
+		   ECC_CTRL_CLR_UE_ERR | ECC_CTRL_CLR_UE_ERRCNT;
 	writel(clearval, base + ECC_CLR_OFST);
-	writel(0x0, base + ECC_CLR_OFST);
+
+	spin_unlock_irqrestore(&priv->reglock, flags);
 
 	return 0;
 }
@@ -515,24 +523,41 @@  static void handle_error(struct mem_ctl_info *mci, struct synps_ecc_status *p)
 
 static void enable_intr(struct synps_edac_priv *priv)
 {
+	unsigned long flags;
+
 	/* Enable UE/CE Interrupts */
-	if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)
-		writel(DDR_UE_MASK | DDR_CE_MASK,
-		       priv->baseaddr + ECC_CLR_OFST);
-	else
+	if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)) {
 		writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK,
 		       priv->baseaddr + DDR_QOS_IRQ_EN_OFST);
 
+		return;
+	}
+
+	spin_lock_irqsave(&priv->reglock, flags);
+
+	writel(DDR_UE_MASK | DDR_CE_MASK,
+	       priv->baseaddr + ECC_CLR_OFST);
+
+	spin_unlock_irqrestore(&priv->reglock, flags);
 }
 
 static void disable_intr(struct synps_edac_priv *priv)
 {
+	unsigned long flags;
+
 	/* Disable UE/CE Interrupts */
-	if (priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)
-		writel(0x0, priv->baseaddr + ECC_CLR_OFST);
-	else
+	if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR)) {
 		writel(DDR_QOSUE_MASK | DDR_QOSCE_MASK,
 		       priv->baseaddr + DDR_QOS_IRQ_DB_OFST);
+
+		return;
+	}
+
+	spin_lock_irqsave(&priv->reglock, flags);
+
+	writel(0, priv->baseaddr + ECC_CLR_OFST);
+
+	spin_unlock_irqrestore(&priv->reglock, flags);
 }
 
 /**
@@ -576,8 +601,6 @@  static irqreturn_t intr_handler(int irq, void *dev_id)
 	/* v3.0 of the controller does not have this register */
 	if (!(priv->p_data->quirks & DDR_ECC_INTR_SELF_CLEAR))
 		writel(regval, priv->baseaddr + DDR_QOS_IRQ_STAT_OFST);
-	else
-		enable_intr(priv);
 
 	return IRQ_HANDLED;
 }
@@ -1359,6 +1382,7 @@  static int mc_probe(struct platform_device *pdev)
 	priv = mci->pvt_info;
 	priv->baseaddr = baseaddr;
 	priv->p_data = p_data;
+	spin_lock_init(&priv->reglock);
 
 	mc_init(mci, pdev);