From patchwork Fri Apr 13 14:49:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex_Gagniuc@Dellteam.com X-Patchwork-Id: 10340255 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id F0E16604D4 for ; Fri, 13 Apr 2018 14:49:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D9C97288DA for ; Fri, 13 Apr 2018 14:49:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CE507288DC; Fri, 13 Apr 2018 14:49:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 15F76288DA for ; Fri, 13 Apr 2018 14:49:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752222AbeDMOtM (ORCPT ); Fri, 13 Apr 2018 10:49:12 -0400 Received: from esa3.dell-outbound.iphmx.com ([68.232.153.94]:39710 "EHLO esa3.dell-outbound.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752212AbeDMOtL (ORCPT ); Fri, 13 Apr 2018 10:49:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=dellteam.com; i=@dellteam.com; q=dns/txt; s=smtpout; t=1523630864; x=1555166864; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=wmK/VxugTT5huHDOBvBUGqUO4nIX3O6yVnhBLtkRMBA=; b=xZwiqAA/A6OgoCZbR/JBpze27aMxGK58VXGTylXB903vdRFeiB1hhdvK 67lN1lSeK+LF2fxjgucGkTq5LYWojSlYzPVAh/eStQQPE3UfShscIPdnM niYzw95k0fU5dWmWwsAH7hIMxU9L3+Tmg685PzJkp7I37DfmdIWVqq80v E=; X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2EEAQBTwtBah8qZ6ERcHAEBAQQBAQoBA?= =?us-ascii?q?YQVgQgoCotcjRKBdIEPkmiBJQNTC4UDAoIuITQYAQIBAQEBAQECAQECEAEBAQo?= =?us-ascii?q?LCQgoL4I1IoJMAQEBBCcTOgUMBAIBCBEEAQEBHgkHRgkIAgQOBQiFBapmM4hFg?= =?us-ascii?q?i+IBIIThBqFP4UIAodOkBAIBY4tjE+QF4ElHGyBH3CDE4IgDgmOF2+NYYEXAQE?= X-IPAS-Result: =?us-ascii?q?A2EEAQBTwtBah8qZ6ERcHAEBAQQBAQoBAYQVgQgoCotcjRK?= =?us-ascii?q?BdIEPkmiBJQNTC4UDAoIuITQYAQIBAQEBAQECAQECEAEBAQoLCQgoL4I1IoJMA?= =?us-ascii?q?QEBBCcTOgUMBAIBCBEEAQEBHgkHRgkIAgQOBQiFBapmM4hFgi+IBIIThBqFP4U?= =?us-ascii?q?IAodOkBAIBY4tjE+QF4ElHGyBH3CDE4IgDgmOF2+NYYEXAQE?= Received: from esa2.dell-outbound2.iphmx.com ([68.232.153.202]) by esa3.dell-outbound.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Apr 2018 09:47:43 -0500 Received: from ausxippc110.us.dell.com ([143.166.85.200]) by esa2.dell-outbound2.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Apr 2018 20:40:16 +0600 X-LoopCount0: from 10.166.135.141 X-IronPort-AV: E=Sophos;i="5.48,445,1517896800"; d="scan'208";a="643561328" X-DLP: DLP_GlobalPCIDSS From: To: CC: , , , Subject: RE: [PATCH 0/4] PCI/AER: Use-after-free fix Thread-Topic: [PATCH 0/4] PCI/AER: Use-after-free fix Thread-Index: AQHT0E6TlZaNG7WBw06FwWxS9Jwt06P9X1tAgABPpgCAARrdcA== Date: Fri, 13 Apr 2018 14:49:07 +0000 Message-ID: References: <20180409220444.6632-1-keith.busch@intel.com> <20180412164709.spesry7skaa3x5hf@sbauer-Z170X-UD5> In-Reply-To: <20180412164709.spesry7skaa3x5hf@sbauer-Z170X-UD5> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.143.242.75] MIME-Version: 1.0 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP I got the cold chills when I realized you called for a delay of 350ms. It's because 350ms is around the delay I've observed to be caused by FFS. First run KASANed with the extra delay, so hopefully, I'll have more cement test results by EOB today. Alex -----Original Message----- From: Scott Bauer [mailto:scott.bauer@intel.com] Sent: Thursday, April 12, 2018 11:47 AM To: Gagniuc, Alexandru - Dell Team Cc: keith.busch@intel.com; linux-pci@vger.kernel.org; bhelgaas@google.com Subject: Re: [PATCH 0/4] PCI/AER: Use-after-free fix On Thu, Apr 12, 2018 at 05:06:05PM +0000, Alex_Gagniuc@Dellteam.com wrote: > From: Keith Busch [mailto:keith.busch@intel.com] > > > AER error handling walks the PCI topology below a root port, saving pointers of the pci_dev structs affected by the error along the way. > > Hi Keith, > > I've been trying to do an ABA test to confirm that your change eliminates the use-after-free issue we've seen. The race seems to be quite elusive, so I can't reliably reproduce it. Your changes have not been forgotten; I have them staged for further testing. > > Alex If you need help triggering the race you can add a sleep/microsleep here: aer_isr_one_error() between the find_source_device and process err device: sbauer@sbauer-Z170X-UD5:~/nvme_code/upstream_jens/linux-block$ git diff drivers/pci/pcie/aer/aerdrv_core.c aer_print_port_info(p_device->port, e_info); - if (find_source_device(p_device->port, e_info)) + if (find_source_device(p_device->port, e_info)) { + msleep(350); aer_process_err_devices(p_device, e_info); + } } } diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index a4bfea52e7d4..5ca0c07b1d05 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "aerdrv.h" #define PCI_EXP_AER_FLAGS (PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \ @@ -740,8 +741,10 @@ static void aer_isr_one_error(struct pcie_device *p_device, aer_print_port_info(p_device->port, e_info); - if (find_source_device(p_device->port, e_info)) + if (find_source_device(p_device->port, e_info)) { + msleep(350); aer_process_err_devices(p_device, e_info); + } } if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) { @@ -759,8 +762,10 @@ static void aer_isr_one_error(struct pcie_device *p_device,