diff mbox

[1/2] x86/MCE: Export memory_error()

Message ID 20170425210740.GA15722@omniknight.lm.intel.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Verma, Vishal L April 25, 2017, 9:07 p.m. UTC
On 04/24, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> Date: Mon, 24 Apr 2017 13:16:50 +0200
> Subject: [PATCH 1/2] x86/MCE: Export memory_error()
> 
> Export the function which checks whether an MCE is a memory error to
> other users so that we can reuse the logic. Drop the boot_cpu_data use,
> while at it, as mce.cpuvendor already has the CPU vendor in there.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/include/asm/mce.h       |  1 +
>  arch/x86/kernel/cpu/mcheck/mce.c | 12 +++++-------
>  2 files changed, 6 insertions(+), 7 deletions(-)
> 
Here is the updated patch to use the above helper:

8<-----


From 9661a85799c9067d762ecf29630f2b7f69897628 Mon Sep 17 00:00:00 2001
From: Vishal Verma <vishal.l.verma@intel.com>
Date: Tue, 25 Apr 2017 15:00:58 -0600
Subject: [PATCH v2] acpi, nfit: fix the memory error check in nfit_handle_mce

The check for an MCE being a memory error in the NFIT mce handler was
bogus. Export the new mce_is_memory_error helper, and use that tp
perform the correct check in the handler.

Reported-by: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 1 +
 drivers/acpi/nfit/mce.c          | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

This applies on tip/master + Borislav's patches in this thread above.
I'm not sure what the right process for queueing this for both
upstream and -stable is, so just replying here. Should I post it
independently?

Comments

Verma, Vishal L May 10, 2017, 7:31 p.m. UTC | #1
T24gVHVlLCAyMDE3LTA0LTI1IGF0IDE1OjA3IC0wNjAwLCBWaXNoYWwgVmVybWEgd3JvdGU6DQo+
IE9uIDA0LzI0LCBCb3Jpc2xhdiBQZXRrb3Ygd3JvdGU6DQo+ID4gRnJvbTogQm9yaXNsYXYgUGV0
a292IDxicEBzdXNlLmRlPg0KPiA+IERhdGU6IE1vbiwgMjQgQXByIDIwMTcgMTM6MTY6NTAgKzAy
MDANCj4gPiBTdWJqZWN0OiBbUEFUQ0ggMS8yXSB4ODYvTUNFOiBFeHBvcnQgbWVtb3J5X2Vycm9y
KCkNCj4gPiANCj4gPiBFeHBvcnQgdGhlIGZ1bmN0aW9uIHdoaWNoIGNoZWNrcyB3aGV0aGVyIGFu
IE1DRSBpcyBhIG1lbW9yeSBlcnJvciB0bw0KPiA+IG90aGVyIHVzZXJzIHNvIHRoYXQgd2UgY2Fu
IHJldXNlIHRoZSBsb2dpYy4gRHJvcCB0aGUgYm9vdF9jcHVfZGF0YQ0KPiA+IHVzZSwNCj4gPiB3
aGlsZSBhdCBpdCwgYXMgbWNlLmNwdXZlbmRvciBhbHJlYWR5IGhhcyB0aGUgQ1BVIHZlbmRvciBp
biB0aGVyZS4NCj4gPiANCj4gPiBTaWduZWQtb2ZmLWJ5OiBCb3Jpc2xhdiBQZXRrb3YgPGJwQHN1
c2UuZGU+DQo+ID4gLS0tDQo+ID4gwqBhcmNoL3g4Ni9pbmNsdWRlL2FzbS9tY2UuaMKgwqDCoMKg
wqDCoMKgfMKgwqAxICsNCj4gPiDCoGFyY2gveDg2L2tlcm5lbC9jcHUvbWNoZWNrL21jZS5jIHwg
MTIgKysrKystLS0tLS0tDQo+ID4gwqAyIGZpbGVzIGNoYW5nZWQsIDYgaW5zZXJ0aW9ucygrKSwg
NyBkZWxldGlvbnMoLSkNCj4gPiANCg0KSGkgQm9yaXMvVG9ueSwNCg0KSSBkaWRuJ3Qgc2VlIHRo
ZSBhYm92ZSBwYXRjaGVzIGluIHRoZSBSQVMgYnJhbmNoZXMgb2YgdGhlIHRpcCB0cmVlIC0NCndl
cmUgeW91IHRoaW5raW5nIHdlIHdvdWxkIGNhcnJ5IHRoZXNlIHRocm91Z2ggdGhlIG52ZGltbSB0
cmVlDQooaW5jbHVkaW5nIG15IHVwZGF0ZWQgcGF0Y2ggYmVsb3cpPw0KDQo+IA0KPiBIZXJlIGlz
IHRoZSB1cGRhdGVkIHBhdGNoIHRvIHVzZSB0aGUgYWJvdmUgaGVscGVyOg0KPiANCj4gODwtLS0t
LQ0KPiANCj4gDQo+IEZyb20gOTY2MWE4NTc5OWM5MDY3ZDc2MmVjZjI5NjMwZjJiN2Y2OTg5NzYy
OCBNb24gU2VwIDE3IDAwOjAwOjAwIDIwMDENCj4gRnJvbTogVmlzaGFsIFZlcm1hIDx2aXNoYWwu
bC52ZXJtYUBpbnRlbC5jb20+DQo+IERhdGU6IFR1ZSwgMjUgQXByIDIwMTcgMTU6MDA6NTggLTA2
MDANCj4gU3ViamVjdDogW1BBVENIIHYyXSBhY3BpLCBuZml0OiBmaXggdGhlIG1lbW9yeSBlcnJv
ciBjaGVjayBpbg0KPiBuZml0X2hhbmRsZV9tY2UNCj4gDQo+IFRoZSBjaGVjayBmb3IgYW4gTUNF
IGJlaW5nIGEgbWVtb3J5IGVycm9yIGluIHRoZSBORklUIG1jZSBoYW5kbGVyIHdhcw0KPiBib2d1
cy4gRXhwb3J0IHRoZSBuZXcgbWNlX2lzX21lbW9yeV9lcnJvciBoZWxwZXIsIGFuZCB1c2UgdGhh
dCB0cA0KPiBwZXJmb3JtIHRoZSBjb3JyZWN0IGNoZWNrIGluIHRoZSBoYW5kbGVyLg0KPiANCj4g
UmVwb3J0ZWQtYnk6IFRvbnkgTHVjayA8dG9ueS5sdWNrQGludGVsLmNvbT4NCj4gQ2M6IEJvcmlz
bGF2IFBldGtvdiA8YnBAc3VzZS5kZT4NCj4gQ2M6IDxzdGFibGVAdmdlci5rZXJuZWwub3JnPg0K
PiBTaWduZWQtb2ZmLWJ5OiBWaXNoYWwgVmVybWEgPHZpc2hhbC5sLnZlcm1hQGludGVsLmNvbT4N
Cj4gLS0tDQo+IMKgYXJjaC94ODYva2VybmVsL2NwdS9tY2hlY2svbWNlLmMgfCAxICsNCj4gwqBk
cml2ZXJzL2FjcGkvbmZpdC9tY2UuY8KgwqDCoMKgwqDCoMKgwqDCoMKgfCAyICstDQo+IMKgMiBm
aWxlcyBjaGFuZ2VkLCAyIGluc2VydGlvbnMoKyksIDEgZGVsZXRpb24oLSkNCj4gDQo+IFRoaXMg
YXBwbGllcyBvbiB0aXAvbWFzdGVyICsgQm9yaXNsYXYncyBwYXRjaGVzIGluIHRoaXMgdGhyZWFk
IGFib3ZlLg0KPiBJJ20gbm90IHN1cmUgd2hhdCB0aGUgcmlnaHQgcHJvY2VzcyBmb3IgcXVldWVp
bmcgdGhpcyBmb3IgYm90aA0KPiB1cHN0cmVhbSBhbmQgLXN0YWJsZSBpcywgc28ganVzdCByZXBs
eWluZyBoZXJlLiBTaG91bGQgSSBwb3N0IGl0DQo+IGluZGVwZW5kZW50bHk/DQo+IA0KPiBkaWZm
IC0tZ2l0IGEvYXJjaC94ODYva2VybmVsL2NwdS9tY2hlY2svbWNlLmMNCj4gYi9hcmNoL3g4Ni9r
ZXJuZWwvY3B1L21jaGVjay9tY2UuYw0KPiBpbmRleCAzNjE4NjVjYS4uNWNmYmFlYiAxMDA2NDQN
Cj4gLS0tIGEvYXJjaC94ODYva2VybmVsL2NwdS9tY2hlY2svbWNlLmMNCj4gKysrIGIvYXJjaC94
ODYva2VybmVsL2NwdS9tY2hlY2svbWNlLmMNCj4gQEAgLTUyNyw2ICs1MjcsNyBAQCBib29sIG1j
ZV9pc19tZW1vcnlfZXJyb3Ioc3RydWN0IG1jZSAqbSkNCj4gwqANCj4gwqAJcmV0dXJuIGZhbHNl
Ow0KPiDCoH0NCj4gK0VYUE9SVF9TWU1CT0xfR1BMKG1jZV9pc19tZW1vcnlfZXJyb3IpOw0KPiDC
oA0KPiDCoHN0YXRpYyBib29sIGNlY19hZGRfbWNlKHN0cnVjdCBtY2UgKm0pDQo+IMKgew0KPiBk
aWZmIC0tZ2l0IGEvZHJpdmVycy9hY3BpL25maXQvbWNlLmMgYi9kcml2ZXJzL2FjcGkvbmZpdC9t
Y2UuYw0KPiBpbmRleCAzYmExYzM0Li5mZDg2YmVjIDEwMDY0NA0KPiAtLS0gYS9kcml2ZXJzL2Fj
cGkvbmZpdC9tY2UuYw0KPiArKysgYi9kcml2ZXJzL2FjcGkvbmZpdC9tY2UuYw0KPiBAQCAtMjYs
NyArMjYsNyBAQCBzdGF0aWMgaW50IG5maXRfaGFuZGxlX21jZShzdHJ1Y3Qgbm90aWZpZXJfYmxv
Y2sNCj4gKm5iLCB1bnNpZ25lZCBsb25nIHZhbCwNCj4gwqAJc3RydWN0IG5maXRfc3BhICpuZml0
X3NwYTsNCj4gwqANCj4gwqAJLyogV2Ugb25seSBjYXJlIGFib3V0IG1lbW9yeSBlcnJvcnMgKi8N
Cj4gLQlpZiAoIShtY2UtPnN0YXR1cyAmIE1DQUNPRCkpDQo+ICsJaWYgKCFtY2VfaXNfbWVtb3J5
X2Vycm9yKG1jZSkpDQo+IMKgCQlyZXR1cm4gTk9USUZZX0RPTkU7DQo+IMKgDQo+IMKgCS8q
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov May 10, 2017, 8:04 p.m. UTC | #2
On Wed, May 10, 2017 at 07:31:30PM +0000, Verma, Vishal L wrote:
> I didn't see the above patches in the RAS branches of the tip tree -

You need to be patient - we have merge window right now. Next week,
after -rc1 releases, it is business as usual again.
Verma, Vishal L May 10, 2017, 8:06 p.m. UTC | #3
On Wed, 2017-05-10 at 22:04 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 07:31:30PM +0000, Verma, Vishal L wrote:

> > I didn't see the above patches in the RAS branches of the tip tree -

> 

> You need to be patient - we have merge window right now. Next week,

> after -rc1 releases, it is business as usual again.

> 

Ah I was under the impression that this can go in for 4.12..
Borislav Petkov May 10, 2017, 8:08 p.m. UTC | #4
On Wed, May 10, 2017 at 08:06:53PM +0000, Verma, Vishal L wrote:
> Ah I was under the impression that this can go in for 4.12..

... and the reason for hurrying it into 4.12 is?
Verma, Vishal L May 10, 2017, 9:12 p.m. UTC | #5
On Wed, 2017-05-10 at 22:08 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 08:06:53PM +0000, Verma, Vishal L wrote:

> > Ah I was under the impression that this can go in for 4.12..

> 

> ... and the reason for hurrying it into 4.12 is?

> 

The memory error check in the nfit handler is a valid, and simple fix.
That said, I understand if you want additional soak time for the full
set.

Thanks,
	-Vishal
Borislav Petkov May 10, 2017, 9:57 p.m. UTC | #6
On Wed, May 10, 2017 at 09:12:12PM +0000, Verma, Vishal L wrote:
> The memory error check in the nfit handler is a valid, and simple fix.

I need the big picture here: "Without this fix, the nfit handler ...".

Then, if the stable rules apply, we can always expedite it through
urgent/stable.
Verma, Vishal L May 10, 2017, 10:03 p.m. UTC | #7
On Wed, 2017-05-10 at 23:57 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 09:12:12PM +0000, Verma, Vishal L wrote:

> > The memory error check in the nfit handler is a valid, and simple

> > fix.

> 

> I need the big picture here: "Without this fix, the nfit handler ...".


..will potentially add bogus address to an 'error list', even when there
may not have been a memory error. (can mce->addr have an address when
the mce is not due to a memory error?)
The result of adding an address to this list is that future accesses to
this location will prematurely error out. Depending on how frequently
machine checks happen that are not memory errors but have the addr field
set (hopefully rare anyway), we could be incorrectly marking a lot of
locations as media errors.

> 

> Then, if the stable rules apply, we can always expedite it through

> urgent/stable.

>
Borislav Petkov May 10, 2017, 10:16 p.m. UTC | #8
On Wed, May 10, 2017 at 10:03:42PM +0000, Verma, Vishal L wrote:
> ... Depending on how frequently machine checks happen that are not
> memory errors but have the addr field set (hopefully rare anyway), we
> could be incorrectly marking a lot of locations as media errors.

Sounds serious enough to me, thanks.

I'll prep the queue next week and run tests.
Verma, Vishal L May 10, 2017, 10:22 p.m. UTC | #9
On Thu, 2017-05-11 at 00:16 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 10:03:42PM +0000, Verma, Vishal L wrote:

> > ... Depending on how frequently machine checks happen that are not

> > memory errors but have the addr field set (hopefully rare anyway),

> > we

> > could be incorrectly marking a lot of locations as media errors.

> 

> Sounds serious enough to me, thanks.

> 

> I'll prep the queue next week and run tests.

> 

Ok, Thanks Boris!

	-Vishal
Borislav Petkov May 17, 2017, 12:38 p.m. UTC | #10
On Wed, May 10, 2017 at 10:22:32PM +0000, Verma, Vishal L wrote:
> > I'll prep the queue next week and run tests.
> > 
> Ok, Thanks Boris!

Ok, I've pushed a branch:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-ras-pending

Please have a look, test, poke, etc...

Thanks.
Verma, Vishal L May 17, 2017, 6:58 p.m. UTC | #11
On Wed, 2017-05-17 at 14:38 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 10:22:32PM +0000, Verma, Vishal L wrote:

> > > I'll prep the queue next week and run tests.

> > > 

> > 

> > Ok, Thanks Boris!

> 

> Ok, I've pushed a branch:

> 

> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-r

> as-pending

> 

> Please have a look, test, poke, etc...


Quick/minor observation - you moved the EXPORT_SYMBOL_GPL into your
original patch, but the commit message of mine still talks about
exporting. Perhaps it should read:

"Use the new mce_is_memory_error() helper.."

Thanks,
	Vishal

> 

> Thanks.

>
Borislav Petkov May 17, 2017, 7:20 p.m. UTC | #12
On Wed, May 17, 2017 at 06:58:00PM +0000, Verma, Vishal L wrote:
> Quick/minor observation - you moved the EXPORT_SYMBOL_GPL into your

Yes, it belongs there conceptually.

> original patch, but the commit message of mine still talks about
> exporting. Perhaps it should read:
> 
> "Use the new mce_is_memory_error() helper.."

Fixed, thanks.
diff mbox

Patch

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 361865ca..5cfbaeb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -527,6 +527,7 @@  bool mce_is_memory_error(struct mce *m)
 
 	return false;
 }
+EXPORT_SYMBOL_GPL(mce_is_memory_error);
 
 static bool cec_add_mce(struct mce *m)
 {
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index 3ba1c34..fd86bec 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -26,7 +26,7 @@  static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
 	struct nfit_spa *nfit_spa;
 
 	/* We only care about memory errors */
-	if (!(mce->status & MCACOD))
+	if (!mce_is_memory_error(mce))
 		return NOTIFY_DONE;
 
 	/*