mbox series

[0/2] EDAC, ghes: Fix use after free and add reference

Message ID 20191014171919.85044-1-james.morse@arm.com (mailing list archive)
Headers show
Series EDAC, ghes: Fix use after free and add reference | expand

Message

James Morse Oct. 14, 2019, 5:19 p.m. UTC
Hello,

ghes_edac can only be registered once, later attempts will silently
do nothing as the driver is already setup. The unregister path also
only expects to be called once, but doesn't check.

This leads to KASAN splats if multiple GHES entries are unregistered,
as the free()d memory is dereferenced, and if we're lucky, free()d
a second time.

Link: lore.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com

Patch 1 is the minimum needed to prevent the dereference and double
free, but this does expose the lack of symmetry. If we unregister
one GHES entry, subsequent notifications will be lost.
Unregistering is unsafe if another CPU is using the free()d memory in
ghes_edac_report_mem_error().

To fix this, Patch 2 uses ghes_init as a reference count.

We can now unbind all the GHES entries, causing ghes_edac to be
unregistered, and start rebinding them again.


Thanks,

James Morse (2):
  EDAC, ghes: Fix Use after free in ghes_edac remove path
  EDAC, ghes: Reference count GHES users of ghes_edac

 drivers/edac/ghes_edac.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Borislav Petkov Oct. 14, 2019, 5:30 p.m. UTC | #1
On Mon, Oct 14, 2019 at 06:19:17PM +0100, James Morse wrote:
> ghes_edac can only be registered once, later attempts will silently
> do nothing as the driver is already setup. The unregister path also
> only expects to be called once, but doesn't check.
> 
> This leads to KASAN splats if multiple GHES entries are unregistered,
> as the free()d memory is dereferenced, and if we're lucky, free()d
> a second time.
> 
> Link: lore.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
> 
> Patch 1 is the minimum needed to prevent the dereference and double
> free, but this does expose the lack of symmetry. If we unregister
> one GHES entry, subsequent notifications will be lost.
> Unregistering is unsafe if another CPU is using the free()d memory in
> ghes_edac_report_mem_error().
> 
> To fix this, Patch 2 uses ghes_init as a reference count.
> 
> We can now unbind all the GHES entries, causing ghes_edac to be
> unregistered, and start rebinding them again.
> 
> 
> Thanks,
> 
> James Morse (2):
>   EDAC, ghes: Fix Use after free in ghes_edac remove path
>   EDAC, ghes: Reference count GHES users of ghes_edac

Thanks for debugging and fixing this but why two patches?

One is perfectly fine IMO. Or?

Btw, I admit that the ghes_init thing is ugly as hell. ;-\

I wonder if it could be ripped out completely and we use only ghes_pvt
for controlling the single driver instance loading and unloading...
James Morse Oct. 14, 2019, 5:40 p.m. UTC | #2
Hi Boris,

On 14/10/2019 18:30, Borislav Petkov wrote:
> On Mon, Oct 14, 2019 at 06:19:17PM +0100, James Morse wrote:
>> ghes_edac can only be registered once, later attempts will silently
>> do nothing as the driver is already setup. The unregister path also
>> only expects to be called once, but doesn't check.
>>
>> This leads to KASAN splats if multiple GHES entries are unregistered,
>> as the free()d memory is dereferenced, and if we're lucky, free()d
>> a second time.
>>
>> Link: lore.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
>>
>> Patch 1 is the minimum needed to prevent the dereference and double
>> free, but this does expose the lack of symmetry. If we unregister
>> one GHES entry, subsequent notifications will be lost.
>> Unregistering is unsafe if another CPU is using the free()d memory in
>> ghes_edac_report_mem_error().
>>
>> To fix this, Patch 2 uses ghes_init as a reference count.
>>
>> We can now unbind all the GHES entries, causing ghes_edac to be

> Thanks for debugging and fixing this but why two patches?
> 
> One is perfectly fine IMO. Or?

They can be merged together if you prefer.

Keeping both avoids the !pvt check in ghes_edac_report_mem_error() going wrong, but I'm
not entirely sure what that is trying to stop...

The possibility of races between notifications and unregister only occurred to me after
testing the first patch, so they ended up as different things in my head: I thought it
deserved its own commit log as its unrelated to the KASAN splat.


> Btw, I admit that the ghes_init thing is ugly as hell. ;-\

> I wonder if it could be ripped out completely and we use only ghes_pvt
> for controlling the single driver instance loading and unloading...

I think you need some kind of reference count to know how many more GHES.x there are out
there that may call unregister, otherwise you race with notifications.


Thanks,

James
Borislav Petkov Oct. 14, 2019, 5:53 p.m. UTC | #3
On Mon, Oct 14, 2019 at 06:40:33PM +0100, James Morse wrote:
> Keeping both avoids the !pvt check in ghes_edac_report_mem_error() going wrong, but I'm
> not entirely sure what that is trying to stop...

It must've been some sanity-check in

  f04c62a7036a ("ghes_edac: add support for reporting errors via EDAC")

> The possibility of races between notifications and unregister only occurred to me after
> testing the first patch, so they ended up as different things in my head: I thought it
> deserved its own commit log as its unrelated to the KASAN splat.

To me they fix two different aspects of the missing counting on unreg.

> I think you need some kind of reference count to know how many more GHES.x there are out
> there that may call unregister, otherwise you race with notifications.

Provided unregister cannot be called concurrently, the

        if (!ghes_pvt)
                return;

in ghes_edac_unregister() should suffice.

But just to be on the safe side, it could get an "instance_mutex" or so
under which ghes_pvt is set and cleared and then that silly counter can
simply go away.

Thoughts?
Borislav Petkov Oct. 15, 2019, 1:25 p.m. UTC | #4
On Mon, Oct 14, 2019 at 06:19:17PM +0100, James Morse wrote:
> Hello,
> 
> ghes_edac can only be registered once, later attempts will silently
> do nothing as the driver is already setup. The unregister path also
> only expects to be called once, but doesn't check.
> 
> This leads to KASAN splats if multiple GHES entries are unregistered,
> as the free()d memory is dereferenced, and if we're lucky, free()d
> a second time.
> 
> Link: lore.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
> 
> Patch 1 is the minimum needed to prevent the dereference and double
> free, but this does expose the lack of symmetry. If we unregister
> one GHES entry, subsequent notifications will be lost.
> Unregistering is unsafe if another CPU is using the free()d memory in
> ghes_edac_report_mem_error().
> 
> To fix this, Patch 2 uses ghes_init as a reference count.
> 
> We can now unbind all the GHES entries, causing ghes_edac to be
> unregistered, and start rebinding them again.
> 
> 
> Thanks,
> 
> James Morse (2):
>   EDAC, ghes: Fix Use after free in ghes_edac remove path
>   EDAC, ghes: Reference count GHES users of ghes_edac
> 
>  drivers/edac/ghes_edac.c | 4 ++++
>  1 file changed, 4 insertions(+)

Patches squashed and queued here:

https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-urgent

Will send it to Linus soonish as it is stable material, I guess.

In the meantime, we can iron the whole deal out and improve it.

Thx.
Borislav Petkov Oct. 16, 2019, 3:17 p.m. UTC | #5
On Mon, Oct 14, 2019 at 07:53:19PM +0200, Borislav Petkov wrote:
> Provided unregister cannot be called concurrently, the
> 
>         if (!ghes_pvt)
>                 return;
> 
> in ghes_edac_unregister() should suffice.
> 
> But just to be on the safe side, it could get an "instance_mutex" or so
> under which ghes_pvt is set and cleared and then that silly counter can
> simply go away.
> 
> Thoughts?

IOW, something simple and straight-forward like this:

---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 0bb62857ffb2..b600f010fa0e 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -26,9 +26,11 @@ struct ghes_edac_pvt {
 	char msg[80];
 };
 
-static atomic_t ghes_init = ATOMIC_INIT(0);
 static struct ghes_edac_pvt *ghes_pvt;
 
+/* GHES instances reg/dereg mutex */
+static DEFINE_MUTEX(ghes_reg_mutex);
+
 /*
  * Sync with other, potentially concurrent callers of
  * ghes_edac_report_mem_error(). We don't know what the
@@ -461,7 +463,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
 	struct ghes_edac_dimm_fill dimm_fill;
-	int idx = -1;
+	int idx = -1, err = 0;
 
 	if (IS_ENABLED(CONFIG_X86)) {
 		/* Check if safe to enable on this system */
@@ -472,11 +474,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		idx = 0;
 	}
 
+	mutex_lock(&ghes_reg_mutex);
+
 	/*
 	 * We have only one logical memory controller to which all DIMMs belong.
 	 */
-	if (atomic_inc_return(&ghes_init) > 1)
-		return 0;
+	if (ghes_pvt)
+		goto unlock;
 
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
@@ -494,7 +498,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto unlock;
 	}
 
 	ghes_pvt	= mci->pvt_info;
@@ -541,23 +546,29 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (rc < 0) {
 		pr_info("Can't register at EDAC core\n");
 		edac_mc_free(mci);
-		return -ENODEV;
+		err = -ENODEV;
 	}
-	return 0;
+
+unlock:
+	mutex_unlock(&ghes_reg_mutex);
+
+	return err;
 }
 
 void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
 
-	if (!ghes_pvt)
-		return;
+	mutex_lock(&ghes_reg_mutex);
 
-	if (atomic_dec_return(&ghes_init))
-		return;
+	if (!ghes_pvt)
+		goto unlock;
 
 	mci = ghes_pvt->mci;
 	ghes_pvt = NULL;
 	edac_mc_del_mc(mci->pdev);
 	edac_mc_free(mci);
+
+unlock:
+	mutex_unlock(&ghes_reg_mutex);
 }
James Morse Oct. 16, 2019, 3:30 p.m. UTC | #6
Hi Boris,

On 16/10/2019 16:17, Borislav Petkov wrote:
> On Mon, Oct 14, 2019 at 07:53:19PM +0200, Borislav Petkov wrote:
>> Provided unregister cannot be called concurrently, the
>>
>>         if (!ghes_pvt)
>>                 return;
>>
>> in ghes_edac_unregister() should suffice.
>>
>> But just to be on the safe side, it could get an "instance_mutex" or so
>> under which ghes_pvt is set and cleared and then that silly counter can
>> simply go away.
>>
>> Thoughts?
> 
> IOW, something simple and straight-forward like this:


> ---
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 0bb62857ffb2..b600f010fa0e 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -26,9 +26,11 @@ struct ghes_edac_pvt {
>  	char msg[80];
>  };
>  
> -static atomic_t ghes_init = ATOMIC_INIT(0);
>  static struct ghes_edac_pvt *ghes_pvt;
>  
> +/* GHES instances reg/dereg mutex */
> +static DEFINE_MUTEX(ghes_reg_mutex);
> +
>  /*
>   * Sync with other, potentially concurrent callers of
>   * ghes_edac_report_mem_error(). We don't know what the
> @@ -461,7 +463,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	struct mem_ctl_info *mci;
>  	struct edac_mc_layer layers[1];
>  	struct ghes_edac_dimm_fill dimm_fill;
> -	int idx = -1;
> +	int idx = -1, err = 0;
>  
>  	if (IS_ENABLED(CONFIG_X86)) {
>  		/* Check if safe to enable on this system */
> @@ -472,11 +474,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  		idx = 0;
>  	}
>  
> +	mutex_lock(&ghes_reg_mutex);
> +
>  	/*
>  	 * We have only one logical memory controller to which all DIMMs belong.
>  	 */
> -	if (atomic_inc_return(&ghes_init) > 1)
> -		return 0;
> +	if (ghes_pvt)
> +		goto unlock;
>  
>  	/* Get the number of DIMMs */
>  	dmi_walk(ghes_edac_count_dimms, &num_dimm);
> @@ -494,7 +498,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
>  	if (!mci) {
>  		pr_info("Can't allocate memory for EDAC data\n");
> -		return -ENOMEM;
> +		err = -ENOMEM;
> +		goto unlock;
>  	}
>  
>  	ghes_pvt	= mci->pvt_info;
> @@ -541,23 +546,29 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	if (rc < 0) {
>  		pr_info("Can't register at EDAC core\n");
>  		edac_mc_free(mci);
> -		return -ENODEV;
> +		err = -ENODEV;
>  	}
> -	return 0;
> +
> +unlock:
> +	mutex_unlock(&ghes_reg_mutex);
> +
> +	return err;
>  }

There are a few more warts we should try and get rid of with this:
ghes_edac_register() publishes the ghes_pvt pointer under the mutex, but the irq handler
reads it without taking the mutex. (obviously it can't).

ghes_edac_register() publishes the pointer before its called edac_mc_add_mc(), which is
pleasant.

(sorry I've been sitting on this while I found new and exciting ways to break my test
machine!)

Combined with the spinlocky bits of:
--------------------------%<--------------------------
From 61fa061790fe7c19af25b25693b61bb75a498058 Mon Sep 17 00:00:00 2001
From: James Morse <james.morse@arm.com>
Date: Wed, 16 Oct 2019 10:02:15 +0100
Subject: [PATCH] EDAC, ghes: Move ghes_init and ghes_pvt under the ghes_lock

ghes_edac has an irqsave spinlock that protects the contents of ghes_pvt,
but not the pointer itself. The register, unregister and irq-handler
functions read this bare global variable expecting to see NULL or the
allocated value. Without READ_ONCE()/WRITE_ONCE(), this is racy.
ghes_edac_register() also publishes the pointer before it has registered
the mci.

Replace ghes_init with an unsigned int counter in ghes_pvt. To access
this or read the ghes_pvt pointer, you must hold the ghes_lock. This
prevents races when these values are modified.

Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/edac/ghes_edac.c | 69 ++++++++++++++++++++++++++++------------
 1 file changed, 48 insertions(+), 21 deletions(-)

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 0bb62857ffb2..804bb07c6acf 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -15,7 +15,10 @@
 #include "edac_module.h"
 #include <ras/ras_event.h>

+/* Hold ghes_lock when accessing ghes_pvt */
 struct ghes_edac_pvt {
+	unsigned int users;
+
 	struct list_head list;
 	struct ghes *ghes;
 	struct mem_ctl_info *mci;
@@ -26,7 +29,6 @@ struct ghes_edac_pvt {
 	char msg[80];
 };

-static atomic_t ghes_init = ATOMIC_INIT(0);
 static struct ghes_edac_pvt *ghes_pvt;

 /*
@@ -79,9 +81,8 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
 		(*num_dimm)++;
 }

-static int get_dimm_smbios_index(u16 handle)
+static int get_dimm_smbios_index(u16 handle, struct mem_ctl_info *mci)
 {
-	struct mem_ctl_info *mci = ghes_pvt->mci;
 	int i;

 	for (i = 0; i < mci->tot_dimms; i++) {
@@ -197,15 +198,12 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err
*mem_err)
 {
 	enum hw_event_mc_err_type type;
 	struct edac_raw_error_desc *e;
+	struct ghes_edac_pvt *pvt;
 	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = ghes_pvt;
 	unsigned long flags;
 	char *p;
 	u8 grain_bits;

-	if (!pvt)
-		return;
-
 	/*
 	 * We can do the locking below because GHES defers error processing
 	 * from NMI to IRQ context. Whenever that changes, we'd at least
@@ -215,6 +213,11 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err
*mem_err)
 		return;

 	spin_lock_irqsave(&ghes_lock, flags);
+	pvt = ghes_pvt;
+	if (!pvt) {
+		spin_unlock_irqrestore(&ghes_lock, flags);
+		return;
+	}

 	mci = pvt->mci;
 	e = &mci->error_desc;
@@ -348,7 +351,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 			p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
 				     mem_err->mem_dev_handle);

-		index = get_dimm_smbios_index(mem_err->mem_dev_handle);
+		index = get_dimm_smbios_index(mem_err->mem_dev_handle, mci);
 		if (index >= 0) {
 			e->top_layer = index;
 			e->enable_per_layer_report = true;
@@ -457,8 +460,10 @@ static struct acpi_platform_list plat_list[] = {
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
 	bool fake = false;
+	unsigned long flags;
 	int rc, num_dimm = 0;
 	struct mem_ctl_info *mci;
+	struct ghes_edac_pvt *pvt;
 	struct edac_mc_layer layers[1];
 	struct ghes_edac_dimm_fill dimm_fill;
 	int idx = -1;
@@ -475,7 +480,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	/*
 	 * We have only one logical memory controller to which all DIMMs belong.
 	 */
-	if (atomic_inc_return(&ghes_init) > 1)
+	spin_lock_irqsave(&ghes_lock, flags);
+	pvt = ghes_pvt;
+	if (pvt)
+		pvt->users++;
+	spin_unlock_irqrestore(&ghes_lock, flags);
+
+	if (pvt)
 		return 0;

 	/* Get the number of DIMMs */
@@ -497,9 +508,10 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		return -ENOMEM;
 	}

-	ghes_pvt	= mci->pvt_info;
-	ghes_pvt->ghes	= ghes;
-	ghes_pvt->mci	= mci;
+	pvt		= mci->pvt_info;
+	pvt->ghes	= ghes;
+	pvt->mci	= mci;
+	pvt->users	= 1;

 	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
@@ -543,21 +555,36 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 		edac_mc_free(mci);
 		return -ENODEV;
 	}
+
+	spin_lock_irqsave(&ghes_lock, flags);
+	WARN_ON_ONCE(ghes_pvt);
+	ghes_pvt = pvt;
+	spin_unlock_irqrestore(&ghes_lock, flags);
+
 	return 0;
 }

 void ghes_edac_unregister(struct ghes *ghes)
 {
 	struct mem_ctl_info *mci;
+	bool do_free = false;
+	unsigned long flags;

-	if (!ghes_pvt)
-		return;
-
-	if (atomic_dec_return(&ghes_init))
-		return;
+	spin_lock_irqsave(&ghes_lock, flags);
+	if (ghes_pvt) {
+		ghes_pvt->users -= 1;
+
+		/* Are we the last user? */
+		if (!ghes_pvt->users) {
+			do_free = true;
+			mci = ghes_pvt->mci;
+			ghes_pvt = NULL;
+		}
+	}
+	spin_unlock_irqrestore(&ghes_lock, flags);

-	mci = ghes_pvt->mci;
-	ghes_pvt = NULL;
-	edac_mc_del_mc(mci->pdev);
-	edac_mc_free(mci);
+	if (do_free) {
+		edac_mc_del_mc(mci->pdev);
+		edac_mc_free(mci);
+	}
 }
Borislav Petkov Oct. 16, 2019, 6:50 p.m. UTC | #7
On Wed, Oct 16, 2019 at 04:30:24PM +0100, James Morse wrote:
> There are a few more warts we should try and get rid of with this:
> ghes_edac_register() publishes the ghes_pvt pointer under the mutex, but the irq handler
> reads it without taking the mutex. (obviously it can't).
> 
> ghes_edac_register() publishes the pointer before its called edac_mc_add_mc(), which is
> pleasant.

Yeah, before we do this, lemme try to simplify the situation more. And
yeah, we don't need the mutex - we can use the spinlock only. But let's
get rid of the need to access the pvt in the IRQ handler. Yeah, we need
the *mci pointer but one can't have it all :)

Anyway, here's what I have, it is only build-tested. I wanna give it a
stern look tomorrow, on a clear head again:

---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 0bb62857ffb2..2075e55d49ab 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -19,15 +19,16 @@ struct ghes_edac_pvt {
 	struct list_head list;
 	struct ghes *ghes;
 	struct mem_ctl_info *mci;
-
-	/* Buffers for the error handling routine */
-	char detail_location[240];
-	char other_detail[160];
-	char msg[80];
 };
 
 static atomic_t ghes_init = ATOMIC_INIT(0);
 static struct ghes_edac_pvt *ghes_pvt;
+static struct mem_ctl_info *ghes_mci;
+
+/* Buffers for the error handling routine */
+static char detail_location[240];
+static char other_detail[160];
+static char msg[80];
 
 /*
  * Sync with other, potentially concurrent callers of
@@ -196,15 +197,10 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
 void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 {
 	enum hw_event_mc_err_type type;
-	struct edac_raw_error_desc *e;
-	struct mem_ctl_info *mci;
-	struct ghes_edac_pvt *pvt = ghes_pvt;
+	struct edac_raw_error_desc e;
 	unsigned long flags;
-	char *p;
 	u8 grain_bits;
-
-	if (!pvt)
-		return;
+	char *p;
 
 	/*
 	 * We can do the locking below because GHES defers error processing
@@ -216,20 +212,20 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 	spin_lock_irqsave(&ghes_lock, flags);
 
-	mci = pvt->mci;
-	e = &mci->error_desc;
+	if (!ghes_mci)
+		goto unlock;
 
 	/* Cleans the error report buffer */
-	memset(e, 0, sizeof (*e));
-	e->error_count = 1;
-	strcpy(e->label, "unknown label");
-	e->msg = pvt->msg;
-	e->other_detail = pvt->other_detail;
-	e->top_layer = -1;
-	e->mid_layer = -1;
-	e->low_layer = -1;
-	*pvt->other_detail = '\0';
-	*pvt->msg = '\0';
+	memset(&e, 0, sizeof (e));
+	e.error_count = 1;
+	strcpy(e.label, "unknown label");
+	e.msg = msg;
+	e.other_detail = other_detail;
+	e.top_layer = -1;
+	e.mid_layer = -1;
+	e.low_layer = -1;
+	*other_detail = '\0';
+	*msg = '\0';
 
 	switch (sev) {
 	case GHES_SEV_CORRECTED:
@@ -251,7 +247,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 	/* Error type, mapped on e->msg */
 	if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE) {
-		p = pvt->msg;
+		p = msg;
 		switch (mem_err->error_type) {
 		case 0:
 			p += sprintf(p, "Unknown");
@@ -306,21 +302,21 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 				     mem_err->error_type);
 		}
 	} else {
-		strcpy(pvt->msg, "unknown error");
+		strcpy(msg, "unknown error");
 	}
 
 	/* Error address */
 	if (mem_err->validation_bits & CPER_MEM_VALID_PA) {
-		e->page_frame_number = mem_err->physical_addr >> PAGE_SHIFT;
-		e->offset_in_page = mem_err->physical_addr & ~PAGE_MASK;
+		e.page_frame_number = mem_err->physical_addr >> PAGE_SHIFT;
+		e.offset_in_page = mem_err->physical_addr & ~PAGE_MASK;
 	}
 
 	/* Error grain */
 	if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
-		e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
+		e.grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
 
 	/* Memory error location, mapped on e->location */
-	p = e->location;
+	p = e.location;
 	if (mem_err->validation_bits & CPER_MEM_VALID_NODE)
 		p += sprintf(p, "node:%d ", mem_err->node);
 	if (mem_err->validation_bits & CPER_MEM_VALID_CARD)
@@ -337,12 +333,13 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 		p += sprintf(p, "col:%d ", mem_err->column);
 	if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION)
 		p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos);
+
 	if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) {
 		const char *bank = NULL, *device = NULL;
 		int index = -1;
 
 		dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device);
-		if (bank != NULL && device != NULL)
+		if (bank && device)
 			p += sprintf(p, "DIMM location:%s %s ", bank, device);
 		else
 			p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
@@ -350,16 +347,16 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 
 		index = get_dimm_smbios_index(mem_err->mem_dev_handle);
 		if (index >= 0) {
-			e->top_layer = index;
-			e->enable_per_layer_report = true;
+			e.top_layer = index;
+			e.enable_per_layer_report = true;
 		}
 
 	}
-	if (p > e->location)
+	if (p > e.location)
 		*(p - 1) = '\0';
 
 	/* All other fields are mapped on e->other_detail */
-	p = pvt->other_detail;
+	p = other_detail;
 	if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) {
 		u64 status = mem_err->error_status;
 
@@ -421,6 +418,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 			break;
 		}
 	}
+
 	if (mem_err->validation_bits & CPER_MEM_VALID_REQUESTOR_ID)
 		p += sprintf(p, "requestorID: 0x%016llx ",
 			     (long long)mem_err->requestor_id);
@@ -430,19 +428,21 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 	if (mem_err->validation_bits & CPER_MEM_VALID_TARGET_ID)
 		p += sprintf(p, "targetID: 0x%016llx ",
 			     (long long)mem_err->responder_id);
-	if (p > pvt->other_detail)
+	if (p > other_detail)
 		*(p - 1) = '\0';
 
 	/* Generate the trace event */
-	grain_bits = fls_long(e->grain);
-	snprintf(pvt->detail_location, sizeof(pvt->detail_location),
-		 "APEI location: %s %s", e->location, e->other_detail);
-	trace_mc_event(type, e->msg, e->label, e->error_count,
-		       mci->mc_idx, e->top_layer, e->mid_layer, e->low_layer,
-		       (e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
-		       grain_bits, e->syndrome, pvt->detail_location);
-
-	edac_raw_mc_handle_error(type, mci, e);
+	grain_bits = fls_long(e.grain);
+	snprintf(detail_location, sizeof(detail_location),
+		 "APEI location: %s %s", e.location, e.other_detail);
+	trace_mc_event(type, msg, e.label, e.error_count,
+		       0, e.top_layer, e.mid_layer, e.low_layer,
+		       (e.page_frame_number << PAGE_SHIFT) | e.offset_in_page,
+		       grain_bits, e.syndrome, detail_location);
+
+	edac_raw_mc_handle_error(type, ghes_mci, &e);
+
+unlock:
 	spin_unlock_irqrestore(&ghes_lock, flags);
 }
 
@@ -500,6 +500,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	ghes_pvt	= mci->pvt_info;
 	ghes_pvt->ghes	= ghes;
 	ghes_pvt->mci	= mci;
+	ghes_mci	= mci;
 
 	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
Borislav Petkov Oct. 21, 2019, 7:37 a.m. UTC | #8
On Wed, Oct 16, 2019 at 08:50:41PM +0200, Borislav Petkov wrote:
> Yeah, before we do this, lemme try to simplify the situation more. And
> yeah, we don't need the mutex - we can use the spinlock only. But let's
> get rid of the need to access the pvt in the IRQ handler. Yeah, we need
> the *mci pointer but one can't have it all :)
> 
> Anyway, here's what I have, it is only build-tested. I wanna give it a
> stern look tomorrow, on a clear head again:

And now I went and redid your patch ontop and thus completely got rid of
ghes_pvt in favor of having one global ghes_mci pointer.

We access it only under the lock and we publish it in
ghes_edac_register() only in the success case. Can't get any simpler
than that.

Thoughts?

I think it is a lot cleaner and straight-forward this way. Lemme know if
you wanna run it on your setup and I'll push the two patches somewhere.

Thx.

---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index bcb6a9c579c6..e55a9cb8ab73 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -15,14 +15,6 @@
 #include "edac_module.h"
 #include <ras/ras_event.h>
 
-struct ghes_edac_pvt {
-	struct list_head list;
-	struct ghes *ghes;
-	struct mem_ctl_info *mci;
-};
-
-static atomic_t ghes_init = ATOMIC_INIT(0);
-static struct ghes_edac_pvt *ghes_pvt;
 static struct mem_ctl_info *ghes_mci;
 
 /* Buffers for the error handling routine */
@@ -80,9 +72,8 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
 		(*num_dimm)++;
 }
 
-static int get_dimm_smbios_index(u16 handle)
+static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle)
 {
-	struct mem_ctl_info *mci = ghes_pvt->mci;
 	int i;
 
 	for (i = 0; i < mci->tot_dimms; i++) {
@@ -345,7 +336,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 			p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
 				     mem_err->mem_dev_handle);
 
-		index = get_dimm_smbios_index(mem_err->mem_dev_handle);
+		index = get_dimm_smbios_index(ghes_mci, mem_err->mem_dev_handle);
 		if (index >= 0) {
 			e.top_layer = index;
 			e.enable_per_layer_report = true;
@@ -456,12 +447,13 @@ static struct acpi_platform_list plat_list[] = {
 
 int ghes_edac_register(struct ghes *ghes, struct device *dev)
 {
-	bool fake = false;
-	int rc, num_dimm = 0;
-	struct mem_ctl_info *mci;
-	struct edac_mc_layer layers[1];
 	struct ghes_edac_dimm_fill dimm_fill;
-	int idx = -1;
+	struct edac_mc_layer layers[1];
+	struct mem_ctl_info *mci;
+	int err = 0, idx = -1;
+	int rc, num_dimm = 0;
+	unsigned long flags;
+	bool fake = false;
 
 	if (IS_ENABLED(CONFIG_X86)) {
 		/* Check if safe to enable on this system */
@@ -475,8 +467,9 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	/*
 	 * We have only one logical memory controller to which all DIMMs belong.
 	 */
-	if (atomic_inc_return(&ghes_init) > 1)
-		return 0;
+	spin_lock_irqsave(&ghes_lock, flags);
+	if (ghes_mci)
+		goto unlock;
 
 	/* Get the number of DIMMs */
 	dmi_walk(ghes_edac_count_dimms, &num_dimm);
@@ -491,17 +484,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	layers[0].size = num_dimm;
 	layers[0].is_virt_csrow = true;
 
-	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
+	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, 0);
 	if (!mci) {
 		pr_info("Can't allocate memory for EDAC data\n");
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto unlock;
 	}
 
-	ghes_pvt	= mci->pvt_info;
-	ghes_pvt->ghes	= ghes;
-	ghes_pvt->mci	= mci;
-	ghes_mci	= mci;
-
 	mci->pdev = dev;
 	mci->mtype_cap = MEM_FLAG_EMPTY;
 	mci->edac_ctl_cap = EDAC_FLAG_NONE;
@@ -542,23 +531,30 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	if (rc < 0) {
 		pr_info("Can't register at EDAC core\n");
 		edac_mc_free(mci);
-		return -ENODEV;
+		err = -ENODEV;
 	}
-	return 0;
+
+	ghes_mci = mci;
+
+unlock:
+	spin_unlock_irqrestore(&ghes_lock, flags);
+
+	return err;
 }
 
 void ghes_edac_unregister(struct ghes *ghes)
 {
-	struct mem_ctl_info *mci;
+	unsigned long flags;
 
-	if (!ghes_pvt)
-		return;
+	spin_lock_irqsave(&ghes_lock, flags);
 
-	if (atomic_dec_return(&ghes_init))
-		return;
+	if (!ghes_mci)
+		goto unlock;
 
-	mci = ghes_pvt->mci;
-	ghes_pvt = NULL;
-	edac_mc_del_mc(mci->pdev);
-	edac_mc_free(mci);
+	edac_mc_del_mc(ghes_mci->pdev);
+	edac_mc_free(ghes_mci);
+	ghes_mci = NULL;
+
+unlock:
+	spin_unlock_irqrestore(&ghes_lock, flags);
 }
Robert Richter Oct. 21, 2019, 10:52 a.m. UTC | #9
Hi Boris,

On 21.10.19 09:37:58, Borislav Petkov wrote:
> On Wed, Oct 16, 2019 at 08:50:41PM +0200, Borislav Petkov wrote:
> > Yeah, before we do this, lemme try to simplify the situation more. And
> > yeah, we don't need the mutex - we can use the spinlock only. But let's
> > get rid of the need to access the pvt in the IRQ handler. Yeah, we need
> > the *mci pointer but one can't have it all :)
> > 
> > Anyway, here's what I have, it is only build-tested. I wanna give it a
> > stern look tomorrow, on a clear head again:
> 
> And now I went and redid your patch ontop and thus completely got rid of
> ghes_pvt in favor of having one global ghes_mci pointer.

please do not do that. While we have atm only one ghes instance
including a single mci, we will need in the future multiple mci
instances to reflect the memory layout in sysfs. I sent already a
patch for this and am going to resent an updated version, see:

 https://lore.kernel.org/patchwork/patch/1093472/

> We access it only under the lock and we publish it in
> ghes_edac_register() only in the success case. Can't get any simpler
> than that.
> 
> Thoughts?

So, could we just keep all data in struct ghes_edac_pvt instead of
global data?

> I think it is a lot cleaner and straight-forward this way. Lemme know if
> you wanna run it on your setup and I'll push the two patches somewhere.

Could we please properly repost patches for review? James last one got
lost as an answer in this mail thread, but contains fundamental
changes of data structures and locking (while the both initially
posted patches contained reasonable oneliners).

I will start reviewing James' last patch now.

Thanks,

-Robert
Robert Richter Oct. 22, 2019, 1:25 p.m. UTC | #10
On 16.10.19 16:30:24, James Morse wrote:
> Hi Boris,
> 
> On 16/10/2019 16:17, Borislav Petkov wrote:
> > On Mon, Oct 14, 2019 at 07:53:19PM +0200, Borislav Petkov wrote:
> >> Provided unregister cannot be called concurrently, the
> >>
> >>         if (!ghes_pvt)
> >>                 return;

ghes_pvt should not be used at all outside the irq handler. It can
being protected by the mci device's refcount and as long mci exists,
ghes_pvt exists too (mci->pvt_info).

So a better fix is to use the mci's devices refcounter (get_device())
to prevent ghes_pvt from being freed until this pointer is invalidated
for sure. The mci can be freed then with just a put_device() call.  Of
course this needs a proper refcounting of all users and a release
handler. Since the mci or ghes_pvt is needed in the irq handler the
pointer to it can be set/unset when registering/releasing the mci.

See also below.

> >>
> >> in ghes_edac_unregister() should suffice.
> >>
> >> But just to be on the safe side, it could get an "instance_mutex" or so
> >> under which ghes_pvt is set and cleared and then that silly counter can
> >> simply go away.
> >>
> >> Thoughts?
> > 
> > IOW, something simple and straight-forward like this:
> 
> 
> > ---
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index 0bb62857ffb2..b600f010fa0e 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -26,9 +26,11 @@ struct ghes_edac_pvt {
> >  	char msg[80];
> >  };
> >  
> > -static atomic_t ghes_init = ATOMIC_INIT(0);
> >  static struct ghes_edac_pvt *ghes_pvt;
> >  
> > +/* GHES instances reg/dereg mutex */
> > +static DEFINE_MUTEX(ghes_reg_mutex);
> > +
> >  /*
> >   * Sync with other, potentially concurrent callers of
> >   * ghes_edac_report_mem_error(). We don't know what the
> > @@ -461,7 +463,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> >  	struct mem_ctl_info *mci;
> >  	struct edac_mc_layer layers[1];
> >  	struct ghes_edac_dimm_fill dimm_fill;
> > -	int idx = -1;
> > +	int idx = -1, err = 0;
> >  
> >  	if (IS_ENABLED(CONFIG_X86)) {
> >  		/* Check if safe to enable on this system */
> > @@ -472,11 +474,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> >  		idx = 0;
> >  	}
> >  
> > +	mutex_lock(&ghes_reg_mutex);
> > +
> >  	/*
> >  	 * We have only one logical memory controller to which all DIMMs belong.
> >  	 */
> > -	if (atomic_inc_return(&ghes_init) > 1)
> > -		return 0;
> > +	if (ghes_pvt)
> > +		goto unlock;
> >  
> >  	/* Get the number of DIMMs */
> >  	dmi_walk(ghes_edac_count_dimms, &num_dimm);
> > @@ -494,7 +498,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> >  	mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
> >  	if (!mci) {
> >  		pr_info("Can't allocate memory for EDAC data\n");
> > -		return -ENOMEM;
> > +		err = -ENOMEM;
> > +		goto unlock;
> >  	}
> >  
> >  	ghes_pvt	= mci->pvt_info;
> > @@ -541,23 +546,29 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> >  	if (rc < 0) {
> >  		pr_info("Can't register at EDAC core\n");
> >  		edac_mc_free(mci);
> > -		return -ENODEV;
> > +		err = -ENODEV;
> >  	}
> > -	return 0;
> > +
> > +unlock:
> > +	mutex_unlock(&ghes_reg_mutex);
> > +
> > +	return err;
> >  }
> 
> There are a few more warts we should try and get rid of with this:
> ghes_edac_register() publishes the ghes_pvt pointer under the mutex, but the irq handler
> reads it without taking the mutex. (obviously it can't).
> 
> ghes_edac_register() publishes the pointer before its called edac_mc_add_mc(), which is
> pleasant.
> 
> (sorry I've been sitting on this while I found new and exciting ways to break my test
> machine!)
> 
> Combined with the spinlocky bits of:
> --------------------------%<--------------------------
> >From 61fa061790fe7c19af25b25693b61bb75a498058 Mon Sep 17 00:00:00 2001
> From: James Morse <james.morse@arm.com>
> Date: Wed, 16 Oct 2019 10:02:15 +0100
> Subject: [PATCH] EDAC, ghes: Move ghes_init and ghes_pvt under the ghes_lock
> 
> ghes_edac has an irqsave spinlock that protects the contents of ghes_pvt,
> but not the pointer itself. The register, unregister and irq-handler
> functions read this bare global variable expecting to see NULL or the
> allocated value. Without READ_ONCE()/WRITE_ONCE(), this is racy.
> ghes_edac_register() also publishes the pointer before it has registered
> the mci.
> 
> Replace ghes_init with an unsigned int counter in ghes_pvt. To access
> this or read the ghes_pvt pointer, you must hold the ghes_lock. This
> prevents races when these values are modified.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/edac/ghes_edac.c | 69 ++++++++++++++++++++++++++++------------
>  1 file changed, 48 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 0bb62857ffb2..804bb07c6acf 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -15,7 +15,10 @@
>  #include "edac_module.h"
>  #include <ras/ras_event.h>
> 
> +/* Hold ghes_lock when accessing ghes_pvt */
>  struct ghes_edac_pvt {
> +	unsigned int users;
> +
>  	struct list_head list;
>  	struct ghes *ghes;
>  	struct mem_ctl_info *mci;
> @@ -26,7 +29,6 @@ struct ghes_edac_pvt {
>  	char msg[80];
>  };
> 
> -static atomic_t ghes_init = ATOMIC_INIT(0);
>  static struct ghes_edac_pvt *ghes_pvt;
> 
>  /*
> @@ -79,9 +81,8 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
>  		(*num_dimm)++;
>  }
> 
> -static int get_dimm_smbios_index(u16 handle)
> +static int get_dimm_smbios_index(u16 handle, struct mem_ctl_info *mci)
>  {
> -	struct mem_ctl_info *mci = ghes_pvt->mci;
>  	int i;
> 
>  	for (i = 0; i < mci->tot_dimms; i++) {
> @@ -197,15 +198,12 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err
> *mem_err)
>  {
>  	enum hw_event_mc_err_type type;
>  	struct edac_raw_error_desc *e;
> +	struct ghes_edac_pvt *pvt;
>  	struct mem_ctl_info *mci;
> -	struct ghes_edac_pvt *pvt = ghes_pvt;
>  	unsigned long flags;
>  	char *p;
>  	u8 grain_bits;
> 
> -	if (!pvt)
> -		return;
> -
>  	/*
>  	 * We can do the locking below because GHES defers error processing
>  	 * from NMI to IRQ context. Whenever that changes, we'd at least
> @@ -215,6 +213,11 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err
> *mem_err)
>  		return;
> 
>  	spin_lock_irqsave(&ghes_lock, flags);
> +	pvt = ghes_pvt;
> +	if (!pvt) {
> +		spin_unlock_irqrestore(&ghes_lock, flags);
> +		return;
> +	}
> 
>  	mci = pvt->mci;
>  	e = &mci->error_desc;
> @@ -348,7 +351,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
>  			p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
>  				     mem_err->mem_dev_handle);
> 
> -		index = get_dimm_smbios_index(mem_err->mem_dev_handle);
> +		index = get_dimm_smbios_index(mem_err->mem_dev_handle, mci);
>  		if (index >= 0) {
>  			e->top_layer = index;
>  			e->enable_per_layer_report = true;
> @@ -457,8 +460,10 @@ static struct acpi_platform_list plat_list[] = {
>  int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  {
>  	bool fake = false;
> +	unsigned long flags;
>  	int rc, num_dimm = 0;
>  	struct mem_ctl_info *mci;
> +	struct ghes_edac_pvt *pvt;
>  	struct edac_mc_layer layers[1];
>  	struct ghes_edac_dimm_fill dimm_fill;
>  	int idx = -1;
> @@ -475,7 +480,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  	/*
>  	 * We have only one logical memory controller to which all DIMMs belong.
>  	 */
> -	if (atomic_inc_return(&ghes_init) > 1)
> +	spin_lock_irqsave(&ghes_lock, flags);
> +	pvt = ghes_pvt;
> +	if (pvt)
> +		pvt->users++;

As written above, we should better use the mci's refcount here.

> +	spin_unlock_irqrestore(&ghes_lock, flags);
> +
> +	if (pvt)
>  		return 0;
> 
>  	/* Get the number of DIMMs */
> @@ -497,9 +508,10 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  		return -ENOMEM;
>  	}
> 
> -	ghes_pvt	= mci->pvt_info;
> -	ghes_pvt->ghes	= ghes;
> -	ghes_pvt->mci	= mci;
> +	pvt		= mci->pvt_info;
> +	pvt->ghes	= ghes;
> +	pvt->mci	= mci;
> +	pvt->users	= 1;
> 
>  	mci->pdev = dev;
>  	mci->mtype_cap = MEM_FLAG_EMPTY;
> @@ -543,21 +555,36 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>  		edac_mc_free(mci);
>  		return -ENODEV;
>  	}
> +
> +	spin_lock_irqsave(&ghes_lock, flags);
> +	WARN_ON_ONCE(ghes_pvt);
> +	ghes_pvt = pvt;
> +	spin_unlock_irqrestore(&ghes_lock, flags);

This still is not safe as ghes_pvt could being setup by another
instance in between.

> +
>  	return 0;
>  }
> 
>  void ghes_edac_unregister(struct ghes *ghes)
>  {
>  	struct mem_ctl_info *mci;
> +	bool do_free = false;
> +	unsigned long flags;
> 
> -	if (!ghes_pvt)
> -		return;
> -
> -	if (atomic_dec_return(&ghes_init))
> -		return;
> +	spin_lock_irqsave(&ghes_lock, flags);
> +	if (ghes_pvt) {
> +		ghes_pvt->users -= 1;
> +
> +		/* Are we the last user? */
> +		if (!ghes_pvt->users) {
> +			do_free = true;
> +			mci = ghes_pvt->mci;
> +			ghes_pvt = NULL;
> +		}
> +	}
> +	spin_unlock_irqrestore(&ghes_lock, flags);
> 
> -	mci = ghes_pvt->mci;
> -	ghes_pvt = NULL;
> -	edac_mc_del_mc(mci->pdev);
> -	edac_mc_free(mci);
> +	if (do_free) {
> +		edac_mc_del_mc(mci->pdev);
> +		edac_mc_free(mci);
> +	}

This does not look straight forward here, it is exactly what a
device release function is for.

-Robert

>  }
> -- 
> 2.20.1
> --------------------------%<--------------------------
> 
> 
> 
> Thanks,
> 
> James