From patchwork Fri May 11 23:00:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thor Thayer X-Patchwork-Id: 10395495 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 24D2F60153 for ; Fri, 11 May 2018 22:58:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B8C02854F for ; Fri, 11 May 2018 22:58:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0000328691; Fri, 11 May 2018 22:58:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5140A2854F for ; Fri, 11 May 2018 22:58:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=fWavdbo9chvWUZylnp8MCmTvMS4/HLgatSXiN9T0vC8=; b=oNp HT6HljS3B/E+3wDqO30CwmbwUSd41t8Bl/UIXXigLLSJjAGh8a3kjWii+omTdyJrheyzxI+9ue/qm syYfA3JkcLzzx6p1r/izYYPxTHUsydYLDVyWmwnXSEt3+fjrjCS5Xex5loarmf7HQgbScf32zMiOw Z770BU1cBV2/5qr0ebO/MRVTsrH5jP8xetmNtdxgMRzKFq+z5O3lWnOXkMmDa642tsmsqOX/RZlhK r9bJpBDIejPK6wWH75mZXjvavh8F9HAMxF/T4K5PYQl3v/q19GbKgLNgP3fRe7lmReHel5vPCdBdY Vnl0hYy9ly1yNSVspyjPlfJhzISYBRg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fHGz2-0005ek-F7; Fri, 11 May 2018 22:57:52 +0000 Received: from mga03.intel.com ([134.134.136.65]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fHGyx-000526-J8 for linux-arm-kernel@lists.infradead.org; Fri, 11 May 2018 22:57:49 +0000 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 May 2018 15:57:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,390,1520924400"; d="scan'208";a="39261516" Received: from tthayer-hp-z620-workstation.an.intel.com ([10.122.105.144]) by fmsmga008.fm.intel.com with ESMTP; 11 May 2018 15:57:34 -0700 From: thor.thayer@linux.intel.com To: bp@alien8.de, mchehab@kernel.org, mark.rutland@arm.com, catalin.marinas@arm.com, will.deacon@arm.com Subject: [PATCH] edac: altera: Add Stratix10 SDRAM Uncorrectable Errors Date: Fri, 11 May 2018 18:00:10 -0500 Message-Id: <1526079610-5527-1-git-send-email-thor.thayer@linux.intel.com> X-Mailer: git-send-email 2.7.4 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180511_155747_747701_D5E08119 X-CRM114-Status: GOOD ( 15.55 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: thor.thayer@linux.intel.com, linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP From: Thor Thayer On Stratix10, uncorrectable errors are routed to the SError exception instead of the IRQ exceptions. In Stratix10, uncorrectable SErrors must be treated as fatal and will cause a panic. Older Altera/Intel parts printed out a message for UE so do that here using the notifier framework. Record the UE in sticky registers that retain the state through a reset. Check these registers on probe and printout the error on startup. Depends on previous patch: commit 2a4ff60626b0 ("arm64: dts: stratix10: add sdram ecc") Signed-off-by: Thor Thayer --- drivers/edac/altera_edac.c | 67 +++++++++++++++++++++++++++++++++++++++------- drivers/edac/altera_edac.h | 8 +++++- 2 files changed, 64 insertions(+), 11 deletions(-) diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c index 0ee6d5969ef2..5672f6718262 100644 --- a/drivers/edac/altera_edac.c +++ b/drivers/edac/altera_edac.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -725,6 +726,13 @@ static int altr_s10_sdram_probe(struct platform_device *pdev) goto err2; } + if (regmap_write(regmap, S10_SYSMGR_ECC_INTMASK_CLR_OFST, + S10_DDR0_IRQ_MASK)) { + edac_printk(KERN_ERR, EDAC_MC, + "Error clearing SDRAM ECC count\n"); + return -ENODEV; + } + if (regmap_update_bits(drvdata->mc_vbase, priv->ecc_irq_en_offset, priv->ecc_irq_en_mask, priv->ecc_irq_en_mask)) { edac_mc_printk(mci, KERN_ERR, @@ -2228,23 +2236,50 @@ module_platform_driver(altr_edac_a10_driver); /************** Stratix 10 EDAC Device Controller Functions> ************/ +#define to_s10edac(p, m) container_of(p, struct altr_stratix10_edac, m) + +/* + * The double bit error is handled through SError which is fatal. This is + * called as a panic notifier to printout ECC error info as part of the panic. + */ +static int s10_edac_dberr_handler(struct notifier_block *this, + unsigned long event, void *ptr) +{ + int bit, err_addr, dberror; + struct altr_stratix10_edac *edac = to_s10edac(this, panic_notifier); + + s10_protected_reg_read(edac, S10_SYSMGR_ECC_INTSTAT_DERR_OFST, + &dberror); + /* Remember the UE Errors for a reboot */ + s10_protected_reg_write(edac, S10_SYSMGR_UE_VAL_OFST, dberror); + if (dberror & S10_DDR0_IRQ_MASK) { + s10_protected_reg_read(edac, S10_DERRADDR_OFST, &err_addr); + /* Remember the UE Error address */ + s10_protected_reg_write(edac, S10_SYSMGR_UE_ADDR_OFST, + err_addr); + edac_printk(KERN_ERR, EDAC_MC, + "EDAC: [Uncorrectable errors @ 0x%08X]\n\n", + err_addr); + } + + return NOTIFY_DONE; +} + static void altr_edac_s10_irq_handler(struct irq_desc *desc) { - int dberr, bit, sm_offset, irq_status; + int bit, sm_offset, irq_status; struct altr_stratix10_edac *edac = irq_desc_get_handler_data(desc); struct irq_chip *chip = irq_desc_get_chip(desc); int irq = irq_desc_get_irq(desc); - dberr = (irq == edac->db_irq) ? 1 : 0; - sm_offset = dberr ? S10_SYSMGR_ECC_INTSTAT_DERR_OFST : - S10_SYSMGR_ECC_INTSTAT_SERR_OFST; + sm_offset = S10_SYSMGR_ECC_INTSTAT_SERR_OFST; chained_irq_enter(chip, desc); s10_protected_reg_read(NULL, sm_offset, &irq_status); for_each_set_bit(bit, (unsigned long *)&irq_status, 32) { - irq = irq_linear_revmap(edac->domain, dberr * 32 + bit); + irq = irq_linear_revmap(edac->domain, bit); if (irq) generic_handle_irq(irq); } @@ -2289,6 +2324,7 @@ static int altr_edac_s10_probe(struct platform_device *pdev) { struct altr_stratix10_edac *edac; struct device_node *child; + int dberror, err_addr; edac = devm_kzalloc(&pdev->dev, sizeof(*edac), GFP_KERNEL); if (!edac) @@ -2318,11 +2354,22 @@ static int altr_edac_s10_probe(struct platform_device *pdev) altr_edac_s10_irq_handler, edac); - edac->db_irq = platform_get_irq(pdev, 1); - if (edac->db_irq >= 0) - irq_set_chained_handler_and_data(edac->db_irq, - altr_edac_s10_irq_handler, - edac); + edac->panic_notifier.notifier_call = s10_edac_dberr_handler; + atomic_notifier_chain_register(&panic_notifier_list, + &edac->panic_notifier); + + /* Printout a message if uncorrectable error previously. */ + s10_protected_reg_read(edac, S10_SYSMGR_UE_VAL_OFST, &dberror); + if (dberror) { + s10_protected_reg_read(edac, S10_SYSMGR_UE_ADDR_OFST, + &err_addr); + edac_printk(KERN_ERR, EDAC_DEVICE, + "Previous Boot UE detected[0x%X] @ 0x%X\n", + dberror, err_addr); + /* Reset the sticky registers */ + s10_protected_reg_write(edac, S10_SYSMGR_UE_VAL_OFST, 0); + s10_protected_reg_write(edac, S10_SYSMGR_UE_ADDR_OFST, 0); + } for_each_child_of_node(pdev->dev.of_node, child) { if (!of_device_is_available(child)) diff --git a/drivers/edac/altera_edac.h b/drivers/edac/altera_edac.h index 747481081072..81f0554e09de 100644 --- a/drivers/edac/altera_edac.h +++ b/drivers/edac/altera_edac.h @@ -180,6 +180,10 @@ /* SDRAM Single Bit Error Count Compare Set Register */ #define S10_SERRCNTREG_OFST 0xF801113C +/* Sticky registers for Uncorrected Errors */ +#define S10_SYSMGR_UE_VAL_OFST 0xFFD12220 +#define S10_SYSMGR_UE_ADDR_OFST 0xFFD12224 + struct altr_sdram_prv_data { int ecc_ctrl_offset; int ecc_ctl_en_mask; @@ -322,6 +326,8 @@ struct altr_sdram_mc_data { #define S10_SYSMGR_ECC_INTSTAT_SERR_OFST 0xFFD1209C #define S10_SYSMGR_ECC_INTSTAT_DERR_OFST 0xFFD120A0 +#define S10_DDR0_IRQ_MASK BIT(16) + struct altr_edac_device_dev; struct edac_device_prv_data { @@ -434,10 +440,10 @@ struct altr_arria10_edac { struct altr_stratix10_edac { struct device *dev; int sb_irq; - int db_irq; struct irq_domain *domain; struct irq_chip irq_chip; struct list_head s10_ecc_devices; + struct notifier_block panic_notifier; }; #endif /* #ifndef _ALTERA_EDAC_H */