From patchwork Fri Oct 18 21:27:41 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Borislav Petkov X-Patchwork-Id: 3070121 Return-Path: X-Original-To: patchwork-linux-acpi@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id A96549F2B6 for ; Fri, 18 Oct 2013 21:28:17 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 7F87120489 for ; Fri, 18 Oct 2013 21:28:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 508ED20459 for ; Fri, 18 Oct 2013 21:27:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757013Ab3JRV1p (ORCPT ); Fri, 18 Oct 2013 17:27:45 -0400 Received: from mail.skyhub.de ([78.46.96.112]:34997 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755651Ab3JRV1o (ORCPT ); Fri, 18 Oct 2013 17:27:44 -0400 X-Virus-Scanned: Nedap ESD1 at mail.skyhub.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=alien8.de; s=alien8; t=1382131663; bh=kY9kd1LanfkMvHIH8HQJKX2LK9BwuPOuwnGUqij/2t8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To; b=jAD30e5QFRZsUxn4eCqIAvGmgZJ6x0iB6apzDe i9+Oqst8vZvLepUXnW9AH4NzIz3F3Z9vKQ/dy3TZBUNLIB33hWlI4qc3oN6M3WUC/hM UKlik63KyuVd3m43uscYA+NEf34wPdTZPw0/SlPz+suygtyhBohj9lCWr2c/uRcCq8= Received: from mail.skyhub.de ([127.0.0.1]) by localhost (door.skyhub.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 2M6-LBYPrR9E; Fri, 18 Oct 2013 23:27:43 +0200 (CEST) Received: from liondog.tnic (p54B7EA0B.dip0.t-ipconnect.de [84.183.234.11]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id B4D491D955E; Fri, 18 Oct 2013 23:27:42 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=alien8.de; s=alien8; t=1382131663; bh=kY9kd1LanfkMvHIH8HQJKX2LK9BwuPOuwnGUqij/2t8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To; b=jAD30e5QFRZsUxn4eCqIAvGmgZJ6x0iB6apzDe i9+Oqst8vZvLepUXnW9AH4NzIz3F3Z9vKQ/dy3TZBUNLIB33hWlI4qc3oN6M3WUC/hM UKlik63KyuVd3m43uscYA+NEf34wPdTZPw0/SlPz+suygtyhBohj9lCWr2c/uRcCq8= Received: by liondog.tnic (Postfix, from userid 1000) id 521CC1027DE; Fri, 18 Oct 2013 23:27:41 +0200 (CEST) Date: Fri, 18 Oct 2013 23:27:41 +0200 From: Borislav Petkov To: "Luck, Tony" Cc: "Naveen N. Rao" , "Chen, Gong" , "joe@perches.com" , "m.chehab@samsung.com" , "arozansk@redhat.com" , "linux-acpi@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform Message-ID: <20131018212741.GA26049@pd.tnic> References: <1382084624-10857-1-git-send-email-gong.chen@linux.intel.com> <1382084624-10857-5-git-send-email-gong.chen@linux.intel.com> <52612BA4.2060906@linux.vnet.ibm.com> <20131018125326.GC1007@pd.tnic> <3908561D78D1C84285E8C5FCA982C28F31D41E37@ORSMSX106.amr.corp.intel.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F31D41E37@ORSMSX106.amr.corp.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Oct 18, 2013 at 08:57:22PM +0000, Luck, Tony wrote: > Long term ... I'd be happy to see mce_log() go away. But we need to > have a robust, well tested replacement in place for some time before > such a move is up for discussion. Basically a userspace daemon consuming the tracepoint or plural, tracepoints. > Yes - double error reporting should be avoided. Right. > Our first platforms to implement this only do so for memory errors. > This could change in the future (the UEFI appendix N error record has > defined sub-sections for lots of types of errors). Ok. > Currently EDAC hooked into the mce even notification chain provides a > return code to indicate whether it completely processed the error, or > whether to fall through to the rest of mce_log(): > > if (ret == NOTIFY_STOP) > return; > > Having both EDAC and this new extended error log both registered on this > chain would probably not be helpful in most cases. Not only that - you don't need EDAC because all the information is in the MCA registers and the eMCA supplement, if there is one. EDAC would be used on older systems which don't sport eMCA. Now, concerning the current situation, we probably want to do something like this: --- Right, we've moved the eMCA print thingie to mce_log so that we get a chance to run the first TP issuing the raw MCA registers and then run the eMCA TP as a follow-up. We've taught mce_ext_err_print() to return a true/false retval to denote: * true: it has collected data successfully, no need to go down the reporting chain * false: eMCA failed somehow, log the error down and trigger mcelog in userspace. How does that sound? > Not sure if we should handle that with user education to not load both > an EDAC and ext_log driver or if there should be some enforcement. Definitely enforcement. The flags thing I was telling you about recently could be one way to do it. > trace_mce_record() dumps the raw data from the machine check banks. I > think there may still be a case for having this. Analysis tools that > look at this trace as well should be smart enough to connect the dots. Yes, sure. The more non-overlaping data we get, the better. Thanks. diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index b1b04123f3d9..382c78eaf474 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -154,6 +154,10 @@ void mce_log(struct mce *mce) /* Emit the trace record: */ trace_mce_record(mce); + if (mce_ext_err_print) + if (mce_ext_err_print(NULL, m.extcpu, i)) + return; + ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce); if (ret == NOTIFY_STOP) return;