From patchwork Wed Sep 4 00:11:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 13789514 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7287363C for ; Wed, 4 Sep 2024 00:13:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725408803; cv=none; b=jhgRhjOl53e5FErPc9rOGcDRZCKm8ps6MaGb6cepPvqV4sak+kG7vRbS4f6xij1xpW7lvKZR1iwvQaWJOPQgkEXnZ5hjvG3yq1vAm2UA1IZu3XpQ5sq/u1IH1xIwdFMBnLsHgbHvH2ITsygDthqJfMRn3X8jzxVkMzEXppsDSWU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725408803; c=relaxed/simple; bh=WFWPbRqSh+wXPZg0OwOlrCFokXF4HObJ2TvWkrqqBoE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MyLUsQcDIm/vxH72JOyFHG8IHGUaMXs33YgGhE3ug3cUDVPl8xdrdKz0shGbcfJbUJs8dhO/2/dOlsGbNLflRLbN62hamMmqhuLBJZ/jO5Liy3hC9GP8R+WmY+OhUKYRx4Gh5eCoa9SieFO92yJRgisdEjnaXGqSeELFiwqmBeg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 315E6C4CEC4; Wed, 4 Sep 2024 00:13:23 +0000 (UTC) From: Dave Jiang To: linux-cxl@vger.kernel.org Cc: dan.j.williams@intel.com, ira.weiny@intel.com, vishal.l.verma@intel.com, alison.schofield@intel.com, Jonathan.Cameron@huawei.com, dave@stgolabs.net Subject: [PATCH v8 3/3] cxl: Add documentation to explain the shared link bandwidth calculation Date: Tue, 3 Sep 2024 17:11:52 -0700 Message-ID: <20240904001316.1688225-4-dave.jiang@intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240904001316.1688225-1-dave.jiang@intel.com> References: <20240904001316.1688225-1-dave.jiang@intel.com> Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Create a kernel documentation to describe how the CXL shared upstream link bandwidth is calculated. Suggested-by: Dan Williams Signed-off-by: Dave Jiang Reviewed-by: Alison Schofield --- v8: - spelling fixups (Jonathan) - remove duplications (Jonathan) - clarify definition of symmetry (Jonathan) --- .../driver-api/cxl/access-coordinates.rst | 92 +++++++++++++++++++ Documentation/driver-api/cxl/index.rst | 1 + 2 files changed, 93 insertions(+) create mode 100644 Documentation/driver-api/cxl/access-coordinates.rst diff --git a/Documentation/driver-api/cxl/access-coordinates.rst b/Documentation/driver-api/cxl/access-coordinates.rst new file mode 100644 index 000000000000..319a820f37a0 --- /dev/null +++ b/Documentation/driver-api/cxl/access-coordinates.rst @@ -0,0 +1,92 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: + +================================== +CXL Access Coordinates Computation +================================== + +Shared Upstream Link Calculation +================================ +For certain CXL region construction with endpoints behind CXL switches (SW) or +Root Ports (RP), there is the possibility of the total bandwidth for all +the endpoints behind a switch being more than the switch upstream link. +A similar situation can occur within the host, upstream of the root ports. +The CXL driver performs an additional pass after all the targets have +arrived for a region in order to recalculate the bandwidths with possible +upstream link being a limiting factor in mind. + +The algorithm assumes the configuration is a symmetric topology as that +maximizes performance. When asymmetric topology is detected, the calculation +is aborted. An asymmetric topology is detected during topology walk where the +number of RPs detected as a grandparent is not equal to the number of devices +iterated in the same iteration loop. The assumption is made that subtle +asymmetry in properties does not happen and all paths to EPs are equal. + +There can be multiple switches under an RP. There can be multiple RPs under +a CXL Host Bridge (HB). There can be multiple HBs under a CXL Fixed Memory +Window Structure (CFMWS). + +An example hierarchy: + + CFMWS 0 + | + _________|_________ + | | + ACPI0017-0 ACPI0017-1 + GP0/HB0/ACPI0016-0 GP1/HB1/ACPI0016-1 + | | | | + RP0 RP1 RP2 RP3 + | | | | + SW 0 SW 1 SW 2 SW 3 + | | | | | | | | + EP0 EP1 EP2 EP3 EP4 EP5 EP6 EP7 + +Computation for the example hierarchy: + +Min (GP0 to CPU BW, + Min(SW 0 Upstream Link to RP0 BW, + Min(SW0SSLBIS for SW0DSP0 (EP0), EP0 DSLBIS, EP0 Upstream Link) + + Min(SW0SSLBIS for SW0DSP1 (EP1), EP1 DSLBIS, EP1 Upstream link)) + + Min(SW 1 Upstream Link to RP1 BW, + Min(SW1SSLBIS for SW1DSP0 (EP2), EP2 DSLBIS, EP2 Upstream Link) + + Min(SW1SSLBIS for SW1DSP1 (EP3), EP3 DSLBIS, EP3 Upstream link))) + +Min (GP1 to CPU BW, + Min(SW 2 Upstream Link to RP2 BW, + Min(SW2SSLBIS for SW2DSP0 (EP4), EP4 DSLBIS, EP4 Upstream Link) + + Min(SW2SSLBIS for SW2DSP1 (EP5), EP5 DSLBIS, EP5 Upstream link)) + + Min(SW 3 Upstream Link to RP3 BW, + Min(SW3SSLBIS for SW3DSP0 (EP6), EP6 DSLBIS, EP6 Upstream Link) + + Min(SW3SSLBIS for SW3DSP1 (EP7), EP7 DSLBIS, EP7 Upstream link)))) + + +The calculation starts at cxl_region_shared_upstream_perf_update(). A xarray +is created to collect all the endpoint bandwidths via the +cxl_endpoint_gather_bandwidth() function. The min() of bandwidth from the +endpoint CDAT and the upstream link bandwidth is calculated. If the endpoint +has a CXL switch as a parent, then min() of calculated bandwidth and the +bandwidth from the SSLBIS for the switch downstream port that is associated +with the endpoint is calculated. The final bandwidth is stored in a +'struct cxl_perf_ctx' in the xarray indexed by a device pointer. If the +endpoint is direct attached to a root port (RP), the device pointer would be an +RP device. If the endpoint is behind a switch, the device pointer would be the +upstream device of the parent switch. + +At the next stage, the code walks through one or more switches if they exist +in the topology. For endpoints directly attached to RPs, this step is skipped. +If there is another switch upstream, the code takes the min() of the current +gathered bandwidth and the upstream link bandwidth. If there's a switch +upstream, then the SSLBIS of the upstream switch. + +Once the topology walk reaches the RP, whether it's direct attached endpoints +or walking through the switch(es), cxl_rp_gather_bandwidth() is called. At +this point all the bandwidths are aggregated per each host bridge, which is +also the index for the resulting xarray. + +The next step is to take the min() of the per host bridge bandwidth and the +bandwidth from the Generic Port (GP). The bandwidths for the GP is retrieved +via ACPI tables SRAT/HMAT. The min bandwidth are aggregated under the same +ACPI0017 device to form a new xarray. + +Finally, the cxl_region_update_bandwidth() is called and the aggregated +bandwidth from all the members of the last xarray is updated for the +access coordinates residing in the cxl region (cxlr) context. diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-api/cxl/index.rst index 12b82725d322..965ba90e8fb7 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -8,6 +8,7 @@ Compute Express Link :maxdepth: 1 memory-devices + access-coordinates maturity-map