[v2] PCI: pciehp: Ignore Link Down/Up caused by DPC

Downstream Port Containment (PCIe Base Spec, sec. 6.2.10) disables the
link upon an error and attempts to re-enable it when instructed by the
DPC driver.

A slot which is both DPC- and hotplug-capable is currently brought down
by pciehp once DPC is triggered (due to the link change) and brought up
on successful recovery.  That's undesirable, the slot should remain up
so that the hotplugged device remains bound to its driver.  DPC notifies
the driver of the error and of successful recovery in pcie_do_recovery()
and the driver may then restore the device to working state.

Moreover, Sinan points out that turning off slot power by pciehp may
foil recovery by DPC:  Power off/on is a cold reset concurrently to
DPC's warm reset.  Sathyanarayanan reports extended delays or failure
in link retraining by DPC if pciehp brings down the slot.

Fix by detecting whether a Link Down event is caused by DPC and awaiting
recovery if so.  On successful recovery, ignore both the Link Down and
the subsequent Link Up event.

Afterwards, check whether the link is down to detect surprise-removal or
another DPC event immediately after DPC recovery.  Ensure that the
corresponding DLLSC event is not ignored by synthesizing it and
invoking irq_wake_thread() to trigger a re-run of pciehp_ist().

The IRQ threads of the hotplug and DPC drivers, pciehp_ist() and
dpc_handler(), race against each other.  If pciehp is faster than DPC,
it will wait until DPC recovery completes.

Recovery consists of two steps:  The first step (waiting for link
disablement) is recognizable by pciehp through a set DPC Trigger Status
bit.  The second step (waiting for link retraining) is recognizable
through a newly introduced PCI_DPC_RECOVERING flag.

If DPC is faster than pciehp, neither of the two flags will be set and
pciehp may glean the recovery status from the new PCI_DPC_RECOVERED flag.
The flag is zero if DPC didn't occur at all, hence DLLSC events are not
ignored by default.

pciehp waits up to 4 seconds before assuming that DPC recovery failed
and bringing down the slot.  This timeout is not taken from the spec
(it doesn't mandate one) but based on a report from Yicong Yang that
DPC may take a bit more than 3 seconds on HiSilicon's Kunpeng platform.

The timeout is necessary because the DPC Trigger Status bit may never
clear:  On Root Ports which support RP Extensions for DPC, the DPC
driver polls the DPC RP Busy bit for up to 1 second before giving up on
DPC recovery.  Without the timeout, pciehp would then wait indefinitely
for DPC to complete.

This commit draws inspiration from previous attempts to synchronize DPC
with pciehp:

By Sinan Kaya, August 2018:
https://lore.kernel.org/linux-pci/20180818065126.77912-1-okaya@kernel.org/

By Ethan Zhao, October 2020:
https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/

By Kuppuswamy Sathyanarayanan, March 2021:
https://lore.kernel.org/linux-pci/59cb30f5e5ac6d65427ceaadf1012b2ba8dbf66c.1615606143.git.sathyanarayanan.kuppuswamy@linux.intel.com/

Reported-by: Sinan Kaya <okaya@kernel.org>
Reported-by: Ethan Zhao <haifeng.zhao@intel.com>
Reported-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/hotplug/pciehp_hpc.c | 36 ++++++++++++++++
 drivers/pci/pci.h                |  4 ++
 drivers/pci/pcie/dpc.c           | 74 +++++++++++++++++++++++++++++---
 3 files changed, 109 insertions(+), 5 deletions(-)

Message ID	0be565d97438fe2a6d57354b3aa4e8626952a00b.1619857124.git.lukas@wunner.de (mailing list archive)
State	Accepted
Delegated to:	Bjorn Helgaas
Headers	show Return-Path: <linux-pci-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6199FC433ED for <linux-pci@archiver.kernel.org>; Sat, 1 May 2021 08:39:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A7E461409 for <linux-pci@archiver.kernel.org>; Sat, 1 May 2021 08:39:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231194AbhEAIkI (ORCPT <rfc822;linux-pci@archiver.kernel.org>); Sat, 1 May 2021 04:40:08 -0400 Received: from mailout2.hostsharing.net ([83.223.78.233]:50323 "EHLO mailout2.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230117AbhEAIkH (ORCPT <rfc822;linux-pci@vger.kernel.org>); Sat, 1 May 2021 04:40:07 -0400 X-Greylist: delayed 599 seconds by postgrey-1.27 at vger.kernel.org; Sat, 01 May 2021 04:40:07 EDT Received: from h08.hostsharing.net (h08.hostsharing.net [IPv6:2a01:37:1000::53df:5f1c:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "RapidSSL TLS DV RSA Mixed SHA256 2020 CA-1" (verified OK)) by mailout2.hostsharing.net (Postfix) with ESMTPS id E30B710189B8E; Sat, 1 May 2021 10:29:15 +0200 (CEST) Received: from localhost (unknown [89.246.108.87]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by h08.hostsharing.net (Postfix) with ESMTPSA id AE6636029D3A; Sat, 1 May 2021 10:29:15 +0200 (CEST) X-Mailbox-Line: From 0be565d97438fe2a6d57354b3aa4e8626952a00b Mon Sep 17 00:00:00 2001 Message-Id: <0be565d97438fe2a6d57354b3aa4e8626952a00b.1619857124.git.lukas@wunner.de> From: Lukas Wunner <lukas@wunner.de> Date: Sat, 01 May 2021 10:29:00 +0200 Subject: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC To: Bjorn Helgaas <helgaas@kernel.org> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>, Dan Williams <dan.j.williams@intel.com>, Ethan Zhao <haifeng.zhao@intel.com>, Sinan Kaya <okaya@kernel.org>, Ashok Raj <ashok.raj@intel.com>, Keith Busch <kbusch@kernel.org>, Yicong Yang <yangyicong@hisilicon.com>, linux-pci@vger.kernel.org, Russell Currey <ruscur@russell.cc>, "Oliver OHalloran" <oohall@gmail.com>, Stuart Hayes <stuart.w.hayes@gmail.com>, Mika Westerberg <mika.westerberg@linux.intel.com> Precedence: bulk List-ID: <linux-pci.vger.kernel.org> X-Mailing-List: linux-pci@vger.kernel.org
Series	[v2] PCI: pciehp: Ignore Link Down/Up caused by DPC \| expand [v2] PCI: pciehp: Ignore Link Down/Up caused by DPC

[v2] PCI: pciehp: Ignore Link Down/Up caused by DPC

Commit Message

Comments

Patch