From patchwork Tue Jun 4 17:54:34 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bjorn Helgaas X-Patchwork-Id: 2661331 Return-Path: X-Original-To: patchwork-linux-acpi@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 7A98FDF2A1 for ; Tue, 4 Jun 2013 17:54:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750902Ab3FDRyj (ORCPT ); Tue, 4 Jun 2013 13:54:39 -0400 Received: from mail-ie0-f171.google.com ([209.85.223.171]:55272 "EHLO mail-ie0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750808Ab3FDRyi (ORCPT ); Tue, 4 Jun 2013 13:54:38 -0400 Received: by mail-ie0-f171.google.com with SMTP id s9so1108881iec.16 for ; Tue, 04 Jun 2013 10:54:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=LqcWUtvI/BA/L0KM/KCPva2c1dKxhGLyhDDAkKdWLdE=; b=IFzvQJfDl9fUmsOJqOdJypF619rvKfAb0Jk5xr7nl3KeA2VGP47XBuF7zptM8NsirS yyF1Im/im1oTZ8VOxARDPHr4mzXE4v+8stH3o2qLiQIa7a+krOys+1GAoSL/Sr+oHJ/3 Wo49+VU92ITeqf5WHIaQiB1fZEFyKZji2SO5wcw4hTt7udGx0dzb7SE3DWS0es6inrb/ rajdzvO8ySWWN3WY6MN89U1jPl/hzsQ70Zx+0hPJ78jS2ufb8GEtJnP2pheQG7+k3YVh FiCFmkY1eq9q8sYoSbCbGsT8sBcs/SO/t5GjoC3F3o+ujBbQSAnWXPk+WxiVrwD+i5bi mtOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent :x-gm-message-state; bh=LqcWUtvI/BA/L0KM/KCPva2c1dKxhGLyhDDAkKdWLdE=; b=cT4OjX6KYmqfezdWQ5mFkJ8ZS0S6VSJSpuMbAsltWgr3JWieJoj4b/cDtA9jwAVGkn qpEm/l1/6IT5DygrignObZ1n0ERXUovd6TgmRA5dM6ul6lKa5r4G1Y2f6zxw91X4zO0S x8nGh0GAUIdg7G2xG8nQOUdsj9GywVflBzEYcP1OjoeOKg1CW8Pv79W2Qj3LNSXXWps0 9XQRYZ293W3jcMJdpZE7HQqOYyChFqarS2LGKzK3HzZwTM99JOf98IqurejFecJ5GzFP b1YhOp8ey9onLkZNWcmtxKXm1vmgFrSFqulZNd3cEWgBHQAmI6NDNv7/jDlqNKZDbKjs 2QCA== X-Received: by 10.50.21.42 with SMTP id s10mr1502457ige.84.1370368477736; Tue, 04 Jun 2013 10:54:37 -0700 (PDT) Received: from google.com ([172.16.54.244]) by mx.google.com with ESMTPSA id l14sm2756022igf.9.2013.06.04.10.54.36 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 04 Jun 2013 10:54:37 -0700 (PDT) Date: Tue, 4 Jun 2013 11:54:34 -0600 From: Bjorn Helgaas To: Betty Dall , rjw@sisk.pl, ying.huang@intel.com, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org Subject: Re: [PATCH v2 2/3] ACPI/APEI: Force fatal AER severity when bus has been reset Message-ID: <20130604175434.GA6548@google.com> References: <1369924769-17183-1-git-send-email-betty.dall@hp.com> <1369924769-17183-3-git-send-email-betty.dall@hp.com> <20130604075336.GC20448@gchen.bj.intel.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20130604075336.GC20448@gchen.bj.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Gm-Message-State: ALoCoQmiKE5vlCnR4l6Ijkrd5RzzDQdtcxUrU/PWzQQSRVQYA8nQR9bQe1v9q4AlpTnOyOqZBcT1nWx8smZmsZDY6ar2qCnmztdH9Z8oQvkrE3lCPuLkNCGYGgLqoP0N52ILM70ad4qyR4JPyuDtJFXr30Gm4xFQhvVvnx6RzMV5EKqLn1wICn9A9zbnJCo3KFGix/SP/H8DigYewiXWNfzZTPQBK0xWcA== Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org On Tue, Jun 04, 2013 at 03:53:36AM -0400, Chen Gong wrote: > On Thu, May 30, 2013 at 08:39:28AM -0600, Betty Dall wrote: > > Date: Thu, 30 May 2013 08:39:28 -0600 > > From: Betty Dall > > To: rjw@sisk.pl, bhelgaas@google.com > > Cc: ying.huang@intel.com, linux-acpi@vger.kernel.org, > > linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, Betty Dall > > > > Subject: [PATCH v2 2/3] ACPI/APEI: Force fatal AER severity when bus has > > been reset > > X-Mailer: git-send-email 1.7.7.6 > > > > The CPER error record has a reset bit that indicates that the platform > > has reset the bus. The reset bit can be set for any severity error > > including recoverable. From the AER code path's perspective, > > any error is fatal if the bus has been reset. This patch upgrades the > > severity of the AER recovery to AER_FATAL whenever the CPER error record > > indicates that the bus has been reset. > > > > Changes since v1: > > Fixed a typo in comment. > > > > Signed-off-by: Betty Dall > > --- > > > > drivers/acpi/apei/ghes.c | 21 ++++++++++++++++++++- > > 1 files changed, 20 insertions(+), 1 deletions(-) > > > > > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > > index d668a8a..1c67d5a 100644 > > --- a/drivers/acpi/apei/ghes.c > > +++ b/drivers/acpi/apei/ghes.c > > @@ -451,7 +451,26 @@ static void ghes_do_proc(struct ghes *ghes, > > int aer_severity; > > devfn = PCI_DEVFN(pcie_err->device_id.device, > > pcie_err->device_id.function); > > - aer_severity = cper_severity_to_aer(sev); > > + /* > > + * Some Firmware First implementations > > + * put the device in SBR to contain > > + * the error. This is indicated by the > > + * CPER Section Descriptor Flags reset > > + * bit which means the component must > > + * be re-initialized or re-enabled > > + * prior to use. Promoting the AER > > + * serverity to FATAL will cause the > > + * AER code to link_reset and allow > > + * drivers to reprogram their cards. > > + */ > > + if (gdata->flags & CPER_SEC_RESET) > > + aer_severity = cper_severity_to_aer( > > + CPER_SEV_FATAL); > > + else > > + aer_severity = > > + cper_severity_to_aer(sev); > > + > > + > > How about this? > if (gdata->flags & CPER_SEC_RESET) > sev = CPER_SEV_FATAL; > cper_severity_to_aer(sev); No. If the object is to make the severity AER_FATAL, you should just do that. You shouldn't fiddle around with the CPER severity, because then you depend on the mapping performed by cper_severity_to_aer(). > > > aer_recover_queue(pcie_err->device_id.segment, > > pcie_err->device_id.bus, > > devfn, aer_severity); In other words, something like the patch below. I don't really care if you use the original "if" above that only sets aer_severity once, or if you overwrite it as below. I overwrote it because it doesn't wrap as many lines. ghes_do_proc() really should just call helpers so the interesting code doesn't have to be indented three and four tab stops. ACPI/APEI: Force fatal AER severity when component has been reset The CPER error record has a reset bit that indicates that the platform has reset the component. The reset bit can be set for any severity error including recoverable. From the AER code path's perspective, any error is fatal if the component has been reset. This patch upgrades the severity of the AER recovery to AER_FATAL in this case. --- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index d668a8a..ab31551 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -449,9 +449,19 @@ static void ghes_do_proc(struct ghes *ghes, pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) { unsigned int devfn; int aer_severity; + devfn = PCI_DEVFN(pcie_err->device_id.device, pcie_err->device_id.function); aer_severity = cper_severity_to_aer(sev); + + /* + * If firmware reset the component to contain + * the error, we must reinitialize it before + * use, so treat it as a fatal AER error. + */ + if (gdata->flags & CPER_SEC_RESET) + aer_severity = AER_FATAL; + aer_recover_queue(pcie_err->device_id.segment, pcie_err->device_id.bus, devfn, aer_severity);