From patchwork Thu Oct 24 17:21:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Miroshnichenko X-Patchwork-Id: 11210495 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69A8B139A for ; Thu, 24 Oct 2019 17:22:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 38F7121925 for ; Thu, 24 Oct 2019 17:22:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=yadro.com header.i=@yadro.com header.b="ilVoazVC" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2440027AbfJXRWH (ORCPT ); Thu, 24 Oct 2019 13:22:07 -0400 Received: from mta-02.yadro.com ([89.207.88.252]:49444 "EHLO mta-01.yadro.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727036AbfJXRWH (ORCPT ); Thu, 24 Oct 2019 13:22:07 -0400 Received: from localhost (unknown [127.0.0.1]) by mta-01.yadro.com (Postfix) with ESMTP id 73722437F6; Thu, 24 Oct 2019 17:22:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yadro.com; h= content-type:content-type:content-transfer-encoding:mime-version :x-mailer:message-id:date:date:subject:subject:from:from :received:received:received; s=mta-01; t=1571937724; x= 1573752125; bh=KbOhoBktNOYFqC4XkNgg32MLMMlZE2VoUNP5zAtqikY=; b=i lVoazVCisQOc0Hi9mJdIB4qpZaXMFFpRnR/NckzU+IZKXMjMOqtur9bZAjSo4PZh Db/eYtwoLMqCXIq0cMnOYbAw80QDTWydXkjGh7KRqh6OrSr+nICQvWFyYWEiw22S kujQYXjqAalzwHFeh7DGLBysDnJ1L4NANqOIaUP2xY= X-Virus-Scanned: amavisd-new at yadro.com Received: from mta-01.yadro.com ([127.0.0.1]) by localhost (mta-01.yadro.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JN_zb6lUCFs6; Thu, 24 Oct 2019 20:22:04 +0300 (MSK) Received: from T-EXCH-02.corp.yadro.com (t-exch-02.corp.yadro.com [172.17.10.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mta-01.yadro.com (Postfix) with ESMTPS id 4BF7E42F14; Thu, 24 Oct 2019 20:22:04 +0300 (MSK) Received: from NB-148.yadro.com (172.17.15.136) by T-EXCH-02.corp.yadro.com (172.17.10.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.669.32; Thu, 24 Oct 2019 20:22:03 +0300 From: Sergey Miroshnichenko To: , CC: Bjorn Helgaas , , Sergey Miroshnichenko Subject: [PATCH RFC 00/11] PCI: hotplug: Movable bus numbers Date: Thu, 24 Oct 2019 20:21:46 +0300 Message-ID: <20191024172157.878735-1-s.miroshnichenko@yadro.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 X-Originating-IP: [172.17.15.136] X-ClientProxiedBy: T-EXCH-01.corp.yadro.com (172.17.10.101) To T-EXCH-02.corp.yadro.com (172.17.10.102) Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org To allow hotplugging bridges, the kernel or BIOS/bootloader/firmware add extra bus numbers per slot, but this range may be not enough for a large bridge and/or nested bridges when hot-adding a chassis full of devices. This patchset proposes an approach similar to movable BARs: bus numbers are not reserved anymore, instead the kernel moves the "tail" of the PCI tree by one, when needed a new bus. When something like this is going to happen: *LARGE* +-[0020:00]---00.0-[01-20]--+-00.0-[02-08]--+-00.0-[03]-- <-- *NESTED* | | +-01.0-[04]-- *BRIDGE* | | +-02.0-[05]-- | | +-03.0-[06]-- | | +-04.0-[07]-- | | \-05.0-[08]-- ... , this will result into the following: +-[0020:00]---00.0-[01-22]--+-00.0-[02-22]--+-00.0-[03-1d]----04.0-[04-1d]--+-00.0-[05]-- | | | +-04.0-[06]-- | | | +-09.0-[07]-- | | | +-0c.0-[08-19]----00.0-[09-19]--+-01.0-[0a]-- | | | | ... | | | | \-11.0-[19]-- | | | ... | | | \-15.0-[1d]-- | | +-01.0-[1e]-- <-- Renamed from 04 | | +-02.0-[1f]-- <-- Renamed from 05 | | +-03.0-[20]-- <-- Renamed from 06 | | +-04.0-[21]-- <-- Renamed from 07 | | \-05.0-[22]-- <-- Renamed from 08 ... This looks to be safe in the kernel, because drivers don't use the raw PCI BDF ID, and we've tested that on our x86 and PowerNV machines: mass storage with roots and network adapters just continue their work while their bus numbers had moved. But here comes the userspace: - procfs entries: % ls -la /proc/bus/pci/* /proc/bus/pci/00: 00.0 02.0 ... 1f.4 1f.6 /proc/bus/pci/04: 00.0 /proc/bus/pci/40: 00.0 - sysfs entries: % ls -la /sys/devices/pci0000:00/ 0000:00:00.0 0000:00:02.0 ... 0000:00:1f.3 0000:00:1f.4 0000:00:1f.6 % ls -la /sys/devices/pci0000:00/0000:00:1c.6/0000:04:00.0/driver driver -> ../../../../bus/pci/drivers/iwlwifi - sysfs symlinks: % ls -la /sys/bus/pci/devices 0000:00:00.0 -> ../../../devices/pci0000:00/0000:00:00.0 0000:00:02.0 -> ../../../devices/pci0000:00/0000:00:02.0 ... 0000:04:00.0 -> ../../../devices/pci0000:00/0000:00:1c.6/0000:04:00.0 0000:40:00.0 -> ../../../devices/pci0000:00/0000:00:1d.2/0000:40:00.0 These patches alter the kernel public API and some internals to be able to remove these files before changing a bus number, and create new versions of them after device has changed its BDF. On one hand, this makes the hotplug predictable, independent of non-kernel program components (BIOS, bootloader, etc.) and cross-platform, but this is also a severe ABI violation. Probably, the udev should have a new action like "rename" in addition to "add" and "remove". Is it feasible to have this feature disabled by default, but with a chance to enable by a kernel command line argument like this: pci=realloc,movable_buses ? This code is follow-up of the "PCI: Allow BAR movement during hotplug" series (v6). Sergey Miroshnichenko (11): PCI: sysfs: Nullify freed pointers PCI: proc: Nullify a freed pointer drivers: base: Make bus_add_device() public drivers: base: Make device_{add|remove}_class_symlinks() public drivers: base: Add bus_disconnect_device() powerpc/pci: Enable assigning bus numbers instead of reading them from DT powerpc/pci: Don't reduce the host bridge bus range PCI: Allow expanding the bridges PCI: hotplug: Add initial support for movable bus numbers PCI: hotplug: movable bus numbers: rename proc and sysfs entries PCI: hotplug: movable bus numbers: compact the gaps in numbering .../admin-guide/kernel-parameters.txt | 3 + arch/powerpc/kernel/pci-common.c | 1 - arch/powerpc/kernel/pci_dn.c | 5 + arch/powerpc/platforms/powernv/eeh-powernv.c | 3 +- drivers/base/base.h | 1 - drivers/base/bus.c | 37 +++ drivers/base/core.c | 6 +- drivers/pci/pci-sysfs.c | 7 +- drivers/pci/pci.c | 3 + drivers/pci/pci.h | 2 + drivers/pci/probe.c | 291 +++++++++++++++++- drivers/pci/proc.c | 1 + include/linux/device.h | 5 + 13 files changed, 351 insertions(+), 14 deletions(-)