From patchwork Fri Jan 17 06:10:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13942891 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FDCE1F7093 for ; Fri, 17 Jan 2025 06:10:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737094236; cv=none; b=Btx05fSsdBXowUKdzckv4oBzltuqlbhCSdcs6dXodiEVsYXJt19I5t6V1LxCdJj6ajrZf+vFCTSirq8XTbPDVtF97kjLHYSVfi+Zw6WaF6K2oBq5LHx0qcyC4xvlw7c4Ai1fbfQFMyQ85WsQhksNdhWd+HYDhErBgpFUQSixxQM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737094236; c=relaxed/simple; bh=bw3UNFNywJlbMKTkUXk8zq+ATQ9enS4C85yLG9GGeGw=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mJxulSGrnoDocNWUOk03Va6sI8WfnZFXb1wY8AroFkh+eJ9GZPY+IICdxfwu8LB8fuqnqolY2LZ+JMmwys7C2P+zf3q0YR6tl6oqKyNR9GYMzjMT5IS6vTpg39s7u7G1Mx+OIv6IbDMFfxC4ArUS4524a6YyvOtrAhSTiHKZuzw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dDBZcl/a; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dDBZcl/a" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737094234; x=1768630234; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bw3UNFNywJlbMKTkUXk8zq+ATQ9enS4C85yLG9GGeGw=; b=dDBZcl/a1OK/l6CaKo54ENOOFSHHKVe3tDdPUmEO1fe8ne/mj4dk3DHe 2qKc0Z1vbAZLKen0XDk8N0L8Nxl99crXeyCPU3W8IkSwi4NbdcPApbiEB /ja4zCM3InBlvQYwn98yqLLe6sqKnltvp0TQjacbl/FrlqyGI8oAXqglY VpNZHHVeip4+djy6maK8p/S3FDjU2wcB35jkxKiMp7iQJUI4LfcsCdHjP 1WwpbATJ2tpMm89L6jUs3pyrAzR7/uTbbtZUM4ID1vbehGNzYZBZ+Sppo Ed8mUB9Bd12C/SZ9QcnFVBPezS0zY45IWwUVNzHeMw6hTs7WxcGwUhLLv g==; X-CSE-ConnectionGUID: ErT6KqqZREiZp8w6U31SeQ== X-CSE-MsgGUID: IPJuZmFrSxKgozHHFOkuxQ== X-IronPort-AV: E=McAfee;i="6700,10204,11317"; a="41193768" X-IronPort-AV: E=Sophos;i="6.13,211,1732608000"; d="scan'208";a="41193768" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2025 22:10:33 -0800 X-CSE-ConnectionGUID: W2sMxw93SDifQFbk/2iAsg== X-CSE-MsgGUID: cmWxvg80Rfuyccc2+Qa7rQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="109789618" Received: from aschofie-mobl2.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.125.109.114]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2025 22:10:33 -0800 Subject: [PATCH 1/4] cxl: Remove the CXL_DECODER_MIXED mistake From: Dan Williams To: linux-cxl@vger.kernel.org Cc: dave.jiang@intel.com Date: Thu, 16 Jan 2025 22:10:32 -0800 Message-ID: <173709423269.753996.17229236572128350685.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <173709422664.753996.4091585899046900035.stgit@dwillia2-xfh.jf.intel.com> References: <173709422664.753996.4091585899046900035.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 CXL_DECODER_MIXED is a safety mechanism introduced for the case where platform firmware has programmed an endpoint decoder that straddles a DPA partition boundary. While the kernel is careful to only allocate DPA capacity within a single partition there is no guarantee that platform firmware, or anything that touched the device before the current kernel, gets that right. However, __cxl_dpa_reserve() will never get to the CXL_DECODER_MIXED designation because of the way it tracks partition boundaries. A request_resource() that spans ->ram_res and ->pmem_res fails with the following signature: __cxl_dpa_reserve: cxl_port endpoint15: decoder15.0: failed to reserve allocation CXL_DECODER_MIXED is dead defensive programming after the driver has already given up on the device. It has never offered any protection in practice, just delete it. Signed-off-by: Dan Williams Reviewed-by: Alejandro Lucero Reviewed-by: Ira Weiny --- drivers/cxl/core/hdm.c | 8 ++++---- drivers/cxl/core/region.c | 12 ------------ drivers/cxl/cxl.h | 4 +--- 3 files changed, 5 insertions(+), 19 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 28edd5822486..be8556119d94 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -329,12 +329,12 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, if (resource_contains(&cxlds->pmem_res, res)) cxled->mode = CXL_DECODER_PMEM; - else if (resource_contains(&cxlds->ram_res, res)) + if (resource_contains(&cxlds->ram_res, res)) cxled->mode = CXL_DECODER_RAM; else { - dev_warn(dev, "decoder%d.%d: %pr mixed mode not supported\n", - port->id, cxled->cxld.id, cxled->dpa_res); - cxled->mode = CXL_DECODER_MIXED; + dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n", + port->id, cxled->cxld.id, res); + cxled->mode = CXL_DECODER_NONE; } port->hdm_end++; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index d77899650798..e4885acac853 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2725,18 +2725,6 @@ static int poison_by_decoder(struct device *dev, void *arg) if (!cxled->dpa_res || !resource_size(cxled->dpa_res)) return rc; - /* - * Regions are only created with single mode decoders: pmem or ram. - * Linux does not support mixed mode decoders. This means that - * reading poison per endpoint decoder adheres to the requirement - * that poison reads of pmem and ram must be separated. - * CXL 3.0 Spec 8.2.9.8.4.1 - */ - if (cxled->mode == CXL_DECODER_MIXED) { - dev_dbg(dev, "poison list read unsupported in mixed mode\n"); - return rc; - } - cxlmd = cxled_to_memdev(cxled); if (cxled->skip) { offset = cxled->dpa_res->start - cxled->skip; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index f6015f24ad38..0fb8d70fa3e5 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -379,7 +379,6 @@ enum cxl_decoder_mode { CXL_DECODER_NONE, CXL_DECODER_RAM, CXL_DECODER_PMEM, - CXL_DECODER_MIXED, CXL_DECODER_DEAD, }; @@ -389,10 +388,9 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) [CXL_DECODER_NONE] = "none", [CXL_DECODER_RAM] = "ram", [CXL_DECODER_PMEM] = "pmem", - [CXL_DECODER_MIXED] = "mixed", }; - if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED) + if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_PMEM) return names[mode]; return "mixed"; } From patchwork Fri Jan 17 06:10:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13942892 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79F211F667A for ; Fri, 17 Jan 2025 06:10:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737094242; cv=none; b=WsbbdfTr7C1H1pp6wYs0zzmUUXBkf+ovDfZ2ZdbQPktuk+D3gJ0C195rE0PJxnM8a3O/TxxEVrMXz4NBAzhGwaUJthqcS4fJ90n1ucwZKAC+liu1WSYsVjQDHe/DYcK+0zInL97Gf9PQvGoWkyoadaKFLV0CoGSejbZjDHsi318= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737094242; c=relaxed/simple; bh=/s+lBNlb5yShetpA4FWX9b7xmpwKv1x1IM0KmUBf4SQ=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PSIETz7OmJKX6QkMWxExX44uxhhzK9cl2V9UWCL2QbkMcZ5DyTgSCHJFkyzgoHcvPa/JMVGyhjMPZkwDlOLkLXQaOk2xDbWt/MjPxB2nqgU8d1V36P5RyZ6+c2JWqpsZUeKz95sXxMEPP2Ly5A8WJf/+lFkGdJpoLPn7X8f+cHY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZWosV/JD; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZWosV/JD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737094239; x=1768630239; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/s+lBNlb5yShetpA4FWX9b7xmpwKv1x1IM0KmUBf4SQ=; b=ZWosV/JDP26IIvtucipzWhGKiZ+j4alzoOllM4QrZ7L9f2/sVvxeOA3y OhRHZ34Hb2gAZYJPw5+IUgQ4Tue/HMpVZ6OMxQpLwrf8Hm0ktpb638IuT dyxgblqYcq+OPtF5miBBPQfPJClV4y5zP1tx0ucde5F0S9Kt8znI+1Ep6 L/n0AweEdTdSJNGus2rxX+jTihEWVxbqw6+nFwp+VeuGpoU3/BQGGkiEU jTHvFoJkQjyzjV15WhJ/yeT0zo+3aTJd2QAR9abBb+rcYD5nZ9cq+5Ai6 lpTtBUiivOKuZBNTQ04TFoDs47tHGD3fpabPCiW8Xm215oh0JR7S0l5PT w==; X-CSE-ConnectionGUID: Hpom8Hg4QnaW+uauvIh6qg== X-CSE-MsgGUID: iEKHxMiVRAupgFNlETw84g== X-IronPort-AV: E=McAfee;i="6700,10204,11317"; a="41193788" X-IronPort-AV: E=Sophos;i="6.13,211,1732608000"; d="scan'208";a="41193788" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2025 22:10:39 -0800 X-CSE-ConnectionGUID: YhZc6cMcR5e0r/htFS1v3Q== X-CSE-MsgGUID: 7ga+VDCrR52MpexRs6fJxA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="109789674" Received: from aschofie-mobl2.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.125.109.114]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2025 22:10:39 -0800 Subject: [PATCH 2/4] cxl: Introduce to_{ram,pmem}_{res,perf}() helpers From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Dave Jiang , Alejandro Lucero , Ira Weiny , dave.jiang@intel.com Date: Thu, 16 Jan 2025 22:10:38 -0800 Message-ID: <173709423850.753996.572292628436250022.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <173709422664.753996.4091585899046900035.stgit@dwillia2-xfh.jf.intel.com> References: <173709422664.753996.4091585899046900035.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for consolidating all DPA partition information into an array of DPA metadata, introduce helpers that hide the layout of the current data. I.e. make the eventual replacement of ->ram_res, ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a no-op for code paths that consume that information, and reduce the noise of follow-on patches. The end goal is to consolidate all DPA information in 'struct cxl_dev_state', but for now the helpers just make it appear that all DPA metadata is relative to @cxlds. Note that a follow-on patch also cleans up the temporary placeholders of @ram_res, and @pmem_res in the qos_class manipulation code, cxl_dpa_alloc(), and cxl_mem_create_range_info(). Cc: Dave Jiang Cc: Alejandro Lucero Cc: Ira Weiny Signed-off-by: Dan Williams --- drivers/cxl/core/cdat.c | 70 +++++++++++++++++++++++++----------------- drivers/cxl/core/hdm.c | 26 ++++++++-------- drivers/cxl/core/mbox.c | 18 ++++++----- drivers/cxl/core/memdev.c | 42 +++++++++++++------------ drivers/cxl/core/region.c | 10 ++++-- drivers/cxl/cxlmem.h | 58 ++++++++++++++++++++++++++++++----- drivers/cxl/mem.c | 2 + tools/testing/cxl/test/cxl.c | 25 ++++++++------- 8 files changed, 159 insertions(+), 92 deletions(-) diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index 8153f8d83a16..b177a488e29b 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -258,29 +258,33 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent, static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, struct xarray *dsmas_xa) { - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); struct device *dev = cxlds->dev; - struct range pmem_range = { - .start = cxlds->pmem_res.start, - .end = cxlds->pmem_res.end, - }; - struct range ram_range = { - .start = cxlds->ram_res.start, - .end = cxlds->ram_res.end, - }; struct dsmas_entry *dent; unsigned long index; + const struct resource *partition[] = { + to_ram_res(cxlds), + to_pmem_res(cxlds), + }; + struct cxl_dpa_perf *perf[] = { + to_ram_perf(cxlds), + to_pmem_perf(cxlds), + }; xa_for_each(dsmas_xa, index, dent) { - if (resource_size(&cxlds->ram_res) && - range_contains(&ram_range, &dent->dpa_range)) - update_perf_entry(dev, dent, &mds->ram_perf); - else if (resource_size(&cxlds->pmem_res) && - range_contains(&pmem_range, &dent->dpa_range)) - update_perf_entry(dev, dent, &mds->pmem_perf); - else - dev_dbg(dev, "no partition for dsmas dpa: %pra\n", - &dent->dpa_range); + for (int i = 0; i < ARRAY_SIZE(partition); i++) { + const struct resource *res = partition[i]; + struct range range = { + .start = res->start, + .end = res->end, + }; + + if (range_contains(&range, &dent->dpa_range)) + update_perf_entry(dev, dent, perf[i]); + else + dev_dbg(dev, + "no partition for dsmas dpa: %pra\n", + &dent->dpa_range); + } } } @@ -304,6 +308,9 @@ static int match_cxlrd_qos_class(struct device *dev, void *data) static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf) { + if (!dpa_perf) + return; + *dpa_perf = (struct cxl_dpa_perf) { .qos_class = CXL_QOS_CLASS_INVALID, }; @@ -312,6 +319,9 @@ static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf) static bool cxl_qos_match(struct cxl_port *root_port, struct cxl_dpa_perf *dpa_perf) { + if (!dpa_perf) + return false; + if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID) return false; @@ -346,7 +356,8 @@ static int match_cxlrd_hb(struct device *dev, void *data) static int cxl_qos_class_verify(struct cxl_memdev *cxlmd) { struct cxl_dev_state *cxlds = cxlmd->cxlds; - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); + struct cxl_dpa_perf *ram_perf = to_ram_perf(cxlds), + *pmem_perf = to_pmem_perf(cxlds); struct cxl_port *root_port; int rc; @@ -359,17 +370,17 @@ static int cxl_qos_class_verify(struct cxl_memdev *cxlmd) root_port = &cxl_root->port; /* Check that the QTG IDs are all sane between end device and root decoders */ - if (!cxl_qos_match(root_port, &mds->ram_perf)) - reset_dpa_perf(&mds->ram_perf); - if (!cxl_qos_match(root_port, &mds->pmem_perf)) - reset_dpa_perf(&mds->pmem_perf); + if (!cxl_qos_match(root_port, ram_perf)) + reset_dpa_perf(ram_perf); + if (!cxl_qos_match(root_port, pmem_perf)) + reset_dpa_perf(pmem_perf); /* Check to make sure that the device's host bridge is under a root decoder */ rc = device_for_each_child(&root_port->dev, cxlmd->endpoint->host_bridge, match_cxlrd_hb); if (!rc) { - reset_dpa_perf(&mds->ram_perf); - reset_dpa_perf(&mds->pmem_perf); + reset_dpa_perf(ram_perf); + reset_dpa_perf(pmem_perf); } return rc; @@ -567,6 +578,9 @@ static bool dpa_perf_contains(struct cxl_dpa_perf *perf, .end = dpa_res->end, }; + if (!perf) + return false; + return range_contains(&perf->dpa_range, &dpa); } @@ -574,15 +588,15 @@ static struct cxl_dpa_perf *cxled_get_dpa_perf(struct cxl_endpoint_decoder *cxle enum cxl_decoder_mode mode) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + struct cxl_dev_state *cxlds = cxlmd->cxlds; struct cxl_dpa_perf *perf; switch (mode) { case CXL_DECODER_RAM: - perf = &mds->ram_perf; + perf = to_ram_perf(cxlds); break; case CXL_DECODER_PMEM: - perf = &mds->pmem_perf; + perf = to_pmem_perf(cxlds); break; default: return ERR_PTR(-EINVAL); diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index be8556119d94..7a85522294ad 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -327,9 +327,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, cxled->dpa_res = res; cxled->skip = skipped; - if (resource_contains(&cxlds->pmem_res, res)) + if (resource_contains(to_pmem_res(cxlds), res)) cxled->mode = CXL_DECODER_PMEM; - if (resource_contains(&cxlds->ram_res, res)) + else if (resource_contains(to_ram_res(cxlds), res)) cxled->mode = CXL_DECODER_RAM; else { dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n", @@ -442,11 +442,11 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, * Only allow modes that are supported by the current partition * configuration */ - if (mode == CXL_DECODER_PMEM && !resource_size(&cxlds->pmem_res)) { + if (mode == CXL_DECODER_PMEM && !cxl_pmem_size(cxlds)) { dev_dbg(dev, "no available pmem capacity\n"); return -ENXIO; } - if (mode == CXL_DECODER_RAM && !resource_size(&cxlds->ram_res)) { + if (mode == CXL_DECODER_RAM && !cxl_ram_size(cxlds)) { dev_dbg(dev, "no available ram capacity\n"); return -ENXIO; } @@ -464,6 +464,8 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) struct device *dev = &cxled->cxld.dev; resource_size_t start, avail, skip; struct resource *p, *last; + const struct resource *ram_res = to_ram_res(cxlds); + const struct resource *pmem_res = to_pmem_res(cxlds); int rc; down_write(&cxl_dpa_rwsem); @@ -480,37 +482,37 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) goto out; } - for (p = cxlds->ram_res.child, last = NULL; p; p = p->sibling) + for (p = ram_res->child, last = NULL; p; p = p->sibling) last = p; if (last) free_ram_start = last->end + 1; else - free_ram_start = cxlds->ram_res.start; + free_ram_start = ram_res->start; - for (p = cxlds->pmem_res.child, last = NULL; p; p = p->sibling) + for (p = pmem_res->child, last = NULL; p; p = p->sibling) last = p; if (last) free_pmem_start = last->end + 1; else - free_pmem_start = cxlds->pmem_res.start; + free_pmem_start = pmem_res->start; if (cxled->mode == CXL_DECODER_RAM) { start = free_ram_start; - avail = cxlds->ram_res.end - start + 1; + avail = ram_res->end - start + 1; skip = 0; } else if (cxled->mode == CXL_DECODER_PMEM) { resource_size_t skip_start, skip_end; start = free_pmem_start; - avail = cxlds->pmem_res.end - start + 1; + avail = pmem_res->end - start + 1; skip_start = free_ram_start; /* * If some pmem is already allocated, then that allocation * already handled the skip. */ - if (cxlds->pmem_res.child && - skip_start == cxlds->pmem_res.child->start) + if (pmem_res->child && + skip_start == pmem_res->child->start) skip_end = skip_start - 1; else skip_end = start - 1; diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 548564c770c0..3502f1633ad2 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1270,24 +1270,26 @@ static int add_dpa_res(struct device *dev, struct resource *parent, int cxl_mem_create_range_info(struct cxl_memdev_state *mds) { struct cxl_dev_state *cxlds = &mds->cxlds; + struct resource *ram_res = to_ram_res(cxlds); + struct resource *pmem_res = to_pmem_res(cxlds); struct device *dev = cxlds->dev; int rc; if (!cxlds->media_ready) { cxlds->dpa_res = DEFINE_RES_MEM(0, 0); - cxlds->ram_res = DEFINE_RES_MEM(0, 0); - cxlds->pmem_res = DEFINE_RES_MEM(0, 0); + *ram_res = DEFINE_RES_MEM(0, 0); + *pmem_res = DEFINE_RES_MEM(0, 0); return 0; } cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); if (mds->partition_align_bytes == 0) { - rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0, + rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, mds->volatile_only_bytes, "ram"); if (rc) return rc; - return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res, + return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, mds->volatile_only_bytes, mds->persistent_only_bytes, "pmem"); } @@ -1298,11 +1300,11 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) return rc; } - rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->ram_res, 0, + rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, mds->active_volatile_bytes, "ram"); if (rc) return rc; - return add_dpa_res(dev, &cxlds->dpa_res, &cxlds->pmem_res, + return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, mds->active_volatile_bytes, mds->active_persistent_bytes, "pmem"); } @@ -1450,8 +1452,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) mds->cxlds.reg_map.host = dev; mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE; mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; - mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID; - mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID; + to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; + to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; return mds; } diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index ae3dfcbe8938..c5f8320ed330 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -80,7 +80,7 @@ static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr, { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - unsigned long long len = resource_size(&cxlds->ram_res); + unsigned long long len = resource_size(to_ram_res(cxlds)); return sysfs_emit(buf, "%#llx\n", len); } @@ -93,7 +93,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr, { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - unsigned long long len = resource_size(&cxlds->pmem_res); + unsigned long long len = cxl_pmem_size(cxlds); return sysfs_emit(buf, "%#llx\n", len); } @@ -198,16 +198,20 @@ static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd) int rc = 0; /* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */ - if (resource_size(&cxlds->pmem_res)) { - offset = cxlds->pmem_res.start; - length = resource_size(&cxlds->pmem_res); + if (cxl_pmem_size(cxlds)) { + const struct resource *res = to_pmem_res(cxlds); + + offset = res->start; + length = resource_size(res); rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); if (rc) return rc; } - if (resource_size(&cxlds->ram_res)) { - offset = cxlds->ram_res.start; - length = resource_size(&cxlds->ram_res); + if (cxl_ram_size(cxlds)) { + const struct resource *res = to_ram_res(cxlds); + + offset = res->start; + length = resource_size(res); rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); /* * Invalid Physical Address is not an error for @@ -409,9 +413,8 @@ static ssize_t pmem_qos_class_show(struct device *dev, { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); - return sysfs_emit(buf, "%d\n", mds->pmem_perf.qos_class); + return sysfs_emit(buf, "%d\n", to_pmem_perf(cxlds)->qos_class); } static struct device_attribute dev_attr_pmem_qos_class = @@ -428,9 +431,8 @@ static ssize_t ram_qos_class_show(struct device *dev, { struct cxl_memdev *cxlmd = to_cxl_memdev(dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); - return sysfs_emit(buf, "%d\n", mds->ram_perf.qos_class); + return sysfs_emit(buf, "%d\n", to_ram_perf(cxlds)->qos_class); } static struct device_attribute dev_attr_ram_qos_class = @@ -466,11 +468,11 @@ static umode_t cxl_ram_visible(struct kobject *kobj, struct attribute *a, int n) { struct device *dev = kobj_to_dev(kobj); struct cxl_memdev *cxlmd = to_cxl_memdev(dev); - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + struct cxl_dpa_perf *perf = to_ram_perf(cxlmd->cxlds); - if (a == &dev_attr_ram_qos_class.attr) - if (mds->ram_perf.qos_class == CXL_QOS_CLASS_INVALID) - return 0; + if (a == &dev_attr_ram_qos_class.attr && + (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID)) + return 0; return a->mode; } @@ -485,11 +487,11 @@ static umode_t cxl_pmem_visible(struct kobject *kobj, struct attribute *a, int n { struct device *dev = kobj_to_dev(kobj); struct cxl_memdev *cxlmd = to_cxl_memdev(dev); - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds); + struct cxl_dpa_perf *perf = to_pmem_perf(cxlmd->cxlds); - if (a == &dev_attr_pmem_qos_class.attr) - if (mds->pmem_perf.qos_class == CXL_QOS_CLASS_INVALID) - return 0; + if (a == &dev_attr_pmem_qos_class.attr && + (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID)) + return 0; return a->mode; } diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e4885acac853..9f0f6fdbc841 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2688,7 +2688,7 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd, if (ctx->mode == CXL_DECODER_RAM) { offset = ctx->offset; - length = resource_size(&cxlds->ram_res) - offset; + length = cxl_ram_size(cxlds) - offset; rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); if (rc == -EFAULT) rc = 0; @@ -2700,9 +2700,11 @@ static int cxl_get_poison_unmapped(struct cxl_memdev *cxlmd, length = resource_size(&cxlds->dpa_res) - offset; if (!length) return 0; - } else if (resource_size(&cxlds->pmem_res)) { - offset = cxlds->pmem_res.start; - length = resource_size(&cxlds->pmem_res); + } else if (cxl_pmem_size(cxlds)) { + const struct resource *res = to_pmem_res(cxlds); + + offset = res->start; + length = resource_size(res); } else { return 0; } diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 2a25d1957ddb..78e92e24d7b5 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -423,8 +423,8 @@ struct cxl_dpa_perf { * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) * @media_ready: Indicate whether the device media is usable * @dpa_res: Overall DPA resource tree for the device - * @pmem_res: Active Persistent memory capacity configuration - * @ram_res: Active Volatile memory capacity configuration + * @_pmem_res: Active Persistent memory capacity configuration + * @_ram_res: Active Volatile memory capacity configuration * @serial: PCIe Device Serial Number * @type: Generic Memory Class device or Vendor Specific Memory device * @cxl_mbox: CXL mailbox context @@ -438,13 +438,41 @@ struct cxl_dev_state { bool rcd; bool media_ready; struct resource dpa_res; - struct resource pmem_res; - struct resource ram_res; + struct resource _pmem_res; + struct resource _ram_res; u64 serial; enum cxl_devtype type; struct cxl_mailbox cxl_mbox; }; +static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds) +{ + return &cxlds->_ram_res; +} + +static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds) +{ + return &cxlds->_pmem_res; +} + +static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds) +{ + const struct resource *res = to_ram_res(cxlds); + + if (!res) + return 0; + return resource_size(res); +} + +static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds) +{ + const struct resource *res = to_pmem_res(cxlds); + + if (!res) + return 0; + return resource_size(res); +} + static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) { return dev_get_drvdata(cxl_mbox->host); @@ -471,8 +499,8 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) * @active_persistent_bytes: sum of hard + soft persistent * @next_volatile_bytes: volatile capacity change pending device reset * @next_persistent_bytes: persistent capacity change pending device reset - * @ram_perf: performance data entry matched to RAM partition - * @pmem_perf: performance data entry matched to PMEM partition + * @_ram_perf: performance data entry matched to RAM partition + * @_pmem_perf: performance data entry matched to PMEM partition * @event: event log driver state * @poison: poison driver state info * @security: security driver state info @@ -496,8 +524,8 @@ struct cxl_memdev_state { u64 next_volatile_bytes; u64 next_persistent_bytes; - struct cxl_dpa_perf ram_perf; - struct cxl_dpa_perf pmem_perf; + struct cxl_dpa_perf _ram_perf; + struct cxl_dpa_perf _pmem_perf; struct cxl_event_state event; struct cxl_poison_state poison; @@ -505,6 +533,20 @@ struct cxl_memdev_state { struct cxl_fw_state fw; }; +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) +{ + struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); + + return &mds->_ram_perf; +} + +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) +{ + struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); + + return &mds->_pmem_perf; +} + static inline struct cxl_memdev_state * to_cxl_memdev_state(struct cxl_dev_state *cxlds) { diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 2f03a4d5606e..9675243bd05b 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -152,7 +152,7 @@ static int cxl_mem_probe(struct device *dev) return -ENXIO; } - if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM)) { + if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) { rc = devm_cxl_add_nvdimm(parent_port, cxlmd); if (rc) { if (rc == -ENODEV) diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index d0337c11f9ee..7f1c5061307b 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -1000,25 +1000,28 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port) find_cxl_root(port); struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; - struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); struct access_coordinate ep_c[ACCESS_COORDINATE_MAX]; - struct range pmem_range = { - .start = cxlds->pmem_res.start, - .end = cxlds->pmem_res.end, + const struct resource *partition[] = { + to_ram_res(cxlds), + to_pmem_res(cxlds), }; - struct range ram_range = { - .start = cxlds->ram_res.start, - .end = cxlds->ram_res.end, + struct cxl_dpa_perf *perf[] = { + to_ram_perf(cxlds), + to_pmem_perf(cxlds), }; if (!cxl_root) return; - if (range_len(&ram_range)) - dpa_perf_setup(port, &ram_range, &mds->ram_perf); + for (int i = 0; i < ARRAY_SIZE(partition); i++) { + const struct resource *res = partition[i]; + struct range range = { + .start = res->start, + .end = res->end, + }; - if (range_len(&pmem_range)) - dpa_perf_setup(port, &pmem_range, &mds->pmem_perf); + dpa_perf_setup(port, &range, perf[i]); + } cxl_memdev_update_perf(cxlmd); From patchwork Fri Jan 17 06:10:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13942893 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A79571F667A for ; Fri, 17 Jan 2025 06:10:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737094248; cv=none; b=sHLhRcWUVvMzOJEm4PpOJNptBH92IN3GVh3h6b8dr2mQmpMDp9eaBfGqaPxZHPmozyB82KuU1Wc0EBf8JdNcbfJNWXeGgrhL7WoSwnNrdUVkvv6OUa+9Ll/xH1ULyp2w6SMHe/iwzdMyanFf+ID0/7QxBqA+0dfIbAApCSrAqAU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737094248; c=relaxed/simple; bh=/+HbGrhAMAtO32DPaRFtvPvHIHOUo3YkkMTmUJQXYPA=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Cqz885G99XVCv53DuXLMVX+K1a1NcPnST9pwQJldAiSt+Q1ScbD4FtYJuVOpluMWhZ3m+tNxR3KWqI9lpZ7hj/hrfmpiDfnp+P+xVfeBpfP4js9ToftVMaI5dDu6MsGtC1+IGtboBRG6kiDB2PMDrUps+R5lfK2v6T6M6Pge8eY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XxUUvKxu; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XxUUvKxu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737094245; x=1768630245; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/+HbGrhAMAtO32DPaRFtvPvHIHOUo3YkkMTmUJQXYPA=; b=XxUUvKxu+SdiWnefF46gY3HPOW5OonVxFsxOswDZDuBDLaP4cpTaqZcs U0ryhOiEaTnf0Nc7+O0WjmnPkFezB3/7eJ9h3kPL+OiZTSd3UW0W4kN6R GrqNOkfZNWxQjMoGPhckowbZXw+lWVBTIA2+dTsfPj3lz9BA+nAGdlC6G RRuXYSv/FkCpiIMc2DOS9egyQYGXOEptuv5nBXIKIm8QMtaEYCoxZ5/Dl ZefUl65ybjkB7ZmA6zm5ZW69hnxI/ZGy1Wnjv/3HPA1PMQFw4Eptkx9Dx B79DuamYm024/wLEWejsc6dgK6CKn+Vf4SnX8JavKPaHcIONkEDH77LYf g==; X-CSE-ConnectionGUID: HZajRvisRh6GPCOTH10MJg== X-CSE-MsgGUID: Go/OGPgNR5u4L3GdyICrAQ== X-IronPort-AV: E=McAfee;i="6700,10204,11317"; a="41193793" X-IronPort-AV: E=Sophos;i="6.13,211,1732608000"; d="scan'208";a="41193793" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2025 22:10:45 -0800 X-CSE-ConnectionGUID: k+b2rn5HRA+dI+qofuIucg== X-CSE-MsgGUID: VGU10dL6QFG35N19cXiEjQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="109789703" Received: from aschofie-mobl2.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.125.109.114]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2025 22:10:45 -0800 Subject: [PATCH 3/4] cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info' From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Dave Jiang , Alejandro Lucero , Ira Weiny , dave.jiang@intel.com Date: Thu, 16 Jan 2025 22:10:44 -0800 Message-ID: <173709424415.753996.10761098712604763500.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <173709422664.753996.4091585899046900035.stgit@dwillia2-xfh.jf.intel.com> References: <173709422664.753996.4091585899046900035.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The pending efforts to add CXL Accelerator (type-2) device [1], and Dynamic Capacity (DCD) support [2], tripped on the no-longer-fit-for-purpose design in the CXL subsystem for tracking device-physical-address (DPA) metadata. Trip hazards include: - CXL Memory Devices need to consider a PMEM partition, but Accelerator devices with CXL.mem likely do not in the common case. - CXL Memory Devices enumerate DPA through Memory Device mailbox commands like Partition Info, Accelerators devices do not. - CXL Memory Devices that support DCD support more than 2 partitions. Some of the driver algorithms are awkward to expand to > 2 partition cases. - DPA performance data is a general capability that can be shared with accelerators, so tracking it in 'struct cxl_memdev_state' is no longer suitable. - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a memory property, it should be phased in favor of a partition id and the memory property comes from the partition info. Towards cleaning up those issues and allowing a smoother landing for the aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared way for Memory Devices and Accelerators to initialize the DPA information in 'struct cxl_dev_state'. For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to get the new data structure initialized, and cleanup some qos_class init. Follow on patches will go further to use the new data structure to cleanup algorithms that are better suited to loop over all possible partitions. cxl_dpa_setup() follows the locking expectations of mutating the device DPA map, and is suitable for Accelerator drivers to use. Accelerators likely only have one hardcoded 'ram' partition to convey to the cxl_core. Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1] Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2] Cc: Dave Jiang Cc: Alejandro Lucero Cc: Ira Weiny Signed-off-by: Dan Williams --- drivers/cxl/core/cdat.c | 15 ++----- drivers/cxl/core/hdm.c | 69 ++++++++++++++++++++++++++++++++++ drivers/cxl/core/mbox.c | 86 ++++++++++++++++++------------------------ drivers/cxl/cxlmem.h | 79 +++++++++++++++++++++++++-------------- drivers/cxl/pci.c | 7 +++ tools/testing/cxl/test/cxl.c | 15 ++----- tools/testing/cxl/test/mem.c | 7 +++ 7 files changed, 176 insertions(+), 102 deletions(-) diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index b177a488e29b..5400a421ad30 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -261,25 +261,18 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, struct device *dev = cxlds->dev; struct dsmas_entry *dent; unsigned long index; - const struct resource *partition[] = { - to_ram_res(cxlds), - to_pmem_res(cxlds), - }; - struct cxl_dpa_perf *perf[] = { - to_ram_perf(cxlds), - to_pmem_perf(cxlds), - }; xa_for_each(dsmas_xa, index, dent) { - for (int i = 0; i < ARRAY_SIZE(partition); i++) { - const struct resource *res = partition[i]; + for (int i = 0; i < cxlds->nr_partitions; i++) { + struct resource *res = &cxlds->part[i].res; struct range range = { .start = res->start, .end = res->end, }; if (range_contains(&range, &dent->dpa_range)) - update_perf_entry(dev, dent, perf[i]); + update_perf_entry(dev, dent, + &cxlds->part[i].perf); else dev_dbg(dev, "no partition for dsmas dpa: %pra\n", diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 7a85522294ad..7e1559b3ed88 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -342,6 +342,75 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, return 0; } +static int add_dpa_res(struct device *dev, struct resource *parent, + struct resource *res, resource_size_t start, + resource_size_t size, const char *type) +{ + int rc; + + *res = (struct resource) { + .name = type, + .start = start, + .end = start + size - 1, + .flags = IORESOURCE_MEM, + }; + if (resource_size(res) == 0) { + dev_dbg(dev, "DPA(%s): no capacity\n", res->name); + return 0; + } + rc = request_resource(parent, res); + if (rc) { + dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, + res, rc); + return rc; + } + + dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); + + return 0; +} + +/* if this fails the caller must destroy @cxlds, there is no recovery */ +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info) +{ + struct device *dev = cxlds->dev; + + guard(rwsem_write)(&cxl_dpa_rwsem); + + if (cxlds->nr_partitions) + return -EBUSY; + + if (!info->size || !info->nr_partitions) { + cxlds->dpa_res = DEFINE_RES_MEM(0, 0); + cxlds->nr_partitions = 0; + return 0; + } + + cxlds->dpa_res = DEFINE_RES_MEM(0, info->size); + + for (int i = 0; i < info->nr_partitions; i++) { + const char *desc; + int rc; + + if (i == CXL_PARTITION_RAM) + desc = "ram"; + else if (i == CXL_PARTITION_PMEM) + desc = "pmem"; + else + desc = ""; + cxlds->part[i].perf.qos_class = CXL_QOS_CLASS_INVALID; + rc = add_dpa_res(dev, &cxlds->dpa_res, &cxlds->part[i].res, + info->range[i].start, + range_len(&info->range[i]), desc); + if (rc) + return rc; + cxlds->nr_partitions++; + } + + return 0; +} +EXPORT_SYMBOL_GPL(cxl_dpa_setup); + int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 3502f1633ad2..7dca5c8c3494 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1241,57 +1241,36 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd) return rc; } -static int add_dpa_res(struct device *dev, struct resource *parent, - struct resource *res, resource_size_t start, - resource_size_t size, const char *type) -{ - int rc; - - res->name = type; - res->start = start; - res->end = start + size - 1; - res->flags = IORESOURCE_MEM; - if (resource_size(res) == 0) { - dev_dbg(dev, "DPA(%s): no capacity\n", res->name); - return 0; - } - rc = request_resource(parent, res); - if (rc) { - dev_err(dev, "DPA(%s): failed to track %pr (%d)\n", res->name, - res, rc); - return rc; - } - - dev_dbg(dev, "DPA(%s): %pr\n", res->name, res); - - return 0; -} - -int cxl_mem_create_range_info(struct cxl_memdev_state *mds) +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info) { struct cxl_dev_state *cxlds = &mds->cxlds; - struct resource *ram_res = to_ram_res(cxlds); - struct resource *pmem_res = to_pmem_res(cxlds); struct device *dev = cxlds->dev; int rc; if (!cxlds->media_ready) { - cxlds->dpa_res = DEFINE_RES_MEM(0, 0); - *ram_res = DEFINE_RES_MEM(0, 0); - *pmem_res = DEFINE_RES_MEM(0, 0); + info->size = 0; return 0; } - cxlds->dpa_res = DEFINE_RES_MEM(0, mds->total_bytes); + info->size = mds->total_bytes; if (mds->partition_align_bytes == 0) { - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, - mds->volatile_only_bytes, "ram"); - if (rc) - return rc; - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, - mds->volatile_only_bytes, - mds->persistent_only_bytes, "pmem"); + info->range[CXL_PARTITION_RAM] = (struct range) { + .start = 0, + .end = mds->volatile_only_bytes - 1, + }; + info->nr_partitions++; + + if (!mds->persistent_only_bytes) + return 0; + + info->range[CXL_PARTITION_PMEM] = (struct range) { + .start = mds->volatile_only_bytes, + .end = mds->volatile_only_bytes + + mds->persistent_only_bytes - 1, + }; + info->nr_partitions++; + return 0; } rc = cxl_mem_get_partition_info(mds); @@ -1300,15 +1279,24 @@ int cxl_mem_create_range_info(struct cxl_memdev_state *mds) return rc; } - rc = add_dpa_res(dev, &cxlds->dpa_res, ram_res, 0, - mds->active_volatile_bytes, "ram"); - if (rc) - return rc; - return add_dpa_res(dev, &cxlds->dpa_res, pmem_res, - mds->active_volatile_bytes, - mds->active_persistent_bytes, "pmem"); + info->range[CXL_PARTITION_RAM] = (struct range) { + .start = 0, + .end = mds->active_volatile_bytes - 1, + }; + info->nr_partitions++; + + if (!mds->active_persistent_bytes) + return 0; + + info->range[CXL_PARTITION_PMEM] = (struct range) { + .start = mds->active_volatile_bytes, + .end = mds->active_volatile_bytes + mds->active_persistent_bytes - 1, + }; + info->nr_partitions++; + + return 0; } -EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, "CXL"); +EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL"); int cxl_set_timestamp(struct cxl_memdev_state *mds) { @@ -1452,8 +1440,6 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) mds->cxlds.reg_map.host = dev; mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE; mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; - to_ram_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; - to_pmem_perf(&mds->cxlds)->qos_class = CXL_QOS_CLASS_INVALID; return mds; } diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 78e92e24d7b5..2e728d4b7327 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -97,6 +97,20 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped); +/* Well known, spec defined partition indices */ +enum cxl_partition { + CXL_PARTITION_RAM, + CXL_PARTITION_PMEM, + CXL_PARTITION_MAX, +}; + +struct cxl_dpa_info { + u64 size; + struct range range[CXL_PARTITION_MAX]; + int nr_partitions; +}; +int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info); + static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port, struct cxl_memdev *cxlmd) { @@ -408,6 +422,16 @@ struct cxl_dpa_perf { int qos_class; }; +/** + * struct cxl_dpa_partition - DPA partition descriptor + * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res) + * @perf: performance attributes of the partition from CDAT + */ +struct cxl_dpa_partition { + struct resource res; + struct cxl_dpa_perf perf; +}; + /** * struct cxl_dev_state - The driver device state * @@ -423,8 +447,8 @@ struct cxl_dpa_perf { * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) * @media_ready: Indicate whether the device media is usable * @dpa_res: Overall DPA resource tree for the device - * @_pmem_res: Active Persistent memory capacity configuration - * @_ram_res: Active Volatile memory capacity configuration + * @part: DPA partition array + * @nr_partitions: Number of DPA partitions * @serial: PCIe Device Serial Number * @type: Generic Memory Class device or Vendor Specific Memory device * @cxl_mbox: CXL mailbox context @@ -438,21 +462,39 @@ struct cxl_dev_state { bool rcd; bool media_ready; struct resource dpa_res; - struct resource _pmem_res; - struct resource _ram_res; + struct cxl_dpa_partition part[CXL_PARTITION_MAX]; + unsigned int nr_partitions; u64 serial; enum cxl_devtype type; struct cxl_mailbox cxl_mbox; }; -static inline struct resource *to_ram_res(struct cxl_dev_state *cxlds) +static inline const struct resource *to_ram_res(struct cxl_dev_state *cxlds) { - return &cxlds->_ram_res; + if (cxlds->nr_partitions > 0) + return &cxlds->part[CXL_PARTITION_RAM].res; + return NULL; } -static inline struct resource *to_pmem_res(struct cxl_dev_state *cxlds) +static inline const struct resource *to_pmem_res(struct cxl_dev_state *cxlds) { - return &cxlds->_pmem_res; + if (cxlds->nr_partitions > 1) + return &cxlds->part[CXL_PARTITION_PMEM].res; + return NULL; +} + +static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) +{ + if (cxlds->nr_partitions > 0) + return &cxlds->part[CXL_PARTITION_RAM].perf; + return NULL; +} + +static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) +{ + if (cxlds->nr_partitions > 1) + return &cxlds->part[CXL_PARTITION_PMEM].perf; + return NULL; } static inline resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds) @@ -499,8 +541,6 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) * @active_persistent_bytes: sum of hard + soft persistent * @next_volatile_bytes: volatile capacity change pending device reset * @next_persistent_bytes: persistent capacity change pending device reset - * @_ram_perf: performance data entry matched to RAM partition - * @_pmem_perf: performance data entry matched to PMEM partition * @event: event log driver state * @poison: poison driver state info * @security: security driver state info @@ -524,29 +564,12 @@ struct cxl_memdev_state { u64 next_volatile_bytes; u64 next_persistent_bytes; - struct cxl_dpa_perf _ram_perf; - struct cxl_dpa_perf _pmem_perf; - struct cxl_event_state event; struct cxl_poison_state poison; struct cxl_security_state security; struct cxl_fw_state fw; }; -static inline struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds) -{ - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); - - return &mds->_ram_perf; -} - -static inline struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds) -{ - struct cxl_memdev_state *mds = container_of(cxlds, typeof(*mds), cxlds); - - return &mds->_pmem_perf; -} - static inline struct cxl_memdev_state * to_cxl_memdev_state(struct cxl_dev_state *cxlds) { @@ -860,7 +883,7 @@ int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox, int cxl_dev_state_identify(struct cxl_memdev_state *mds); int cxl_await_media_ready(struct cxl_dev_state *cxlds); int cxl_enumerate_cmds(struct cxl_memdev_state *mds); -int cxl_mem_create_range_info(struct cxl_memdev_state *mds); +int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info); struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev); void set_exclusive_cxl_commands(struct cxl_memdev_state *mds, unsigned long *cmds); diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 0241d1d7133a..47dbfe406236 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -900,6 +900,7 @@ __ATTRIBUTE_GROUPS(cxl_rcd); static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus); + struct cxl_dpa_info range_info = { 0 }; struct cxl_memdev_state *mds; struct cxl_dev_state *cxlds; struct cxl_register_map map; @@ -989,7 +990,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; - rc = cxl_mem_create_range_info(mds); + rc = cxl_mem_dpa_fetch(mds, &range_info); + if (rc) + return rc; + + rc = cxl_dpa_setup(cxlds, &range_info); if (rc) return rc; diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 7f1c5061307b..ba3d48b37de3 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -1001,26 +1001,19 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port) struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct access_coordinate ep_c[ACCESS_COORDINATE_MAX]; - const struct resource *partition[] = { - to_ram_res(cxlds), - to_pmem_res(cxlds), - }; - struct cxl_dpa_perf *perf[] = { - to_ram_perf(cxlds), - to_pmem_perf(cxlds), - }; if (!cxl_root) return; - for (int i = 0; i < ARRAY_SIZE(partition); i++) { - const struct resource *res = partition[i]; + for (int i = 0; i < cxlds->nr_partitions; i++) { + struct resource *res = &cxlds->part[i].res; + struct cxl_dpa_perf *perf = &cxlds->part[i].perf; struct range range = { .start = res->start, .end = res->end, }; - dpa_perf_setup(port, &range, perf[i]); + dpa_perf_setup(port, &range, perf); } cxl_memdev_update_perf(cxlmd); diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index 347c1e7b37bd..ed365e083c8f 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -1477,6 +1477,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) struct cxl_dev_state *cxlds; struct cxl_mockmem_data *mdata; struct cxl_mailbox *cxl_mbox; + struct cxl_dpa_info range_info = { 0 }; int rc; mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL); @@ -1537,7 +1538,11 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) if (rc) return rc; - rc = cxl_mem_create_range_info(mds); + rc = cxl_mem_dpa_fetch(mds, &range_info); + if (rc) + return rc; + + rc = cxl_dpa_setup(cxlds, &range_info); if (rc) return rc; From patchwork Fri Jan 17 06:10:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13942894 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4822F1F7069 for ; Fri, 17 Jan 2025 06:10:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737094253; cv=none; b=kkmFMAD4e3NhIOK49Y7dCMI5n/tTqyNxlaPLf1ylbQaYcpM/VbqmaIGF9iK72+O/Nel/ShIquydUWAp0YO/GGmB1ijOnlDzb3QJth3EOuAxYqDYriaaNShatki50XEcT7gQ4h81uzDxMY/giJTGzdQgz/Y1b424RZqs3mFkoosA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737094253; c=relaxed/simple; bh=ljTGYzYVAQSrKCBMs5aAGlQl4yjqxOfF7oYG+cAG0zA=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Wa3wA2J67e7vZGFvYmt4p5ytkj1lXhAomZs4fV7qFKWsrqgeYkq2BOVz/2KQeYR9BkSRMsNj2CqilI+bnyR7zPmJWDZGW7VVxFNgDAeOdRi+RHaEi5SMS+OG44DSWmmT9jXuP0r5TaX28kaV7W485RRQq+fQpsf1TMRiCgYZLL8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BR4H+Ns8; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BR4H+Ns8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1737094251; x=1768630251; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ljTGYzYVAQSrKCBMs5aAGlQl4yjqxOfF7oYG+cAG0zA=; b=BR4H+Ns89xFaYEHHIwjEMoYQOyvwbv9tlm8I3uEvmUvzORBNfCDajHTf 5yIZA2y9IPNfhcbb08iqpXbap4rAIzsiFH3MGIEy33INRUaa9RyDVblaf 7vJH7KgeE5Ci3vZ04J53KSGGDmhjqCEQJY2lrr/pZNgsT+nNS5FbT+2P5 duiStkzvMDFCYvxvgTM67J5r0n7FETw213lWp5XDu+zgQG0JFNy7ym2dh wFIirWiAkEE3zazt8k/dWhP6i/h+RUV+Y/tn4Lqoe057mcat2wow1arO8 z9EGNQatACT+EMsnnPq5TK7XefEHaJGTeUVuj+x9cganjfprU+nzBIns/ A==; X-CSE-ConnectionGUID: 0i6RPwJLRmi0UtXiYuEsDw== X-CSE-MsgGUID: XwCJZeFAQwKboS003mrzbw== X-IronPort-AV: E=McAfee;i="6700,10204,11317"; a="41193796" X-IronPort-AV: E=Sophos;i="6.13,211,1732608000"; d="scan'208";a="41193796" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2025 22:10:51 -0800 X-CSE-ConnectionGUID: HW/tTUg2SMCM/0J8s23ipQ== X-CSE-MsgGUID: Ks3H8wA1QAyeKlGVXguarA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="109789713" Received: from aschofie-mobl2.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.125.109.114]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2025 22:10:51 -0800 Subject: [PATCH 4/4] cxl: Make cxl_dpa_alloc() DPA partition number agnostic From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Dave Jiang , Alejandro Lucero , Ira Weiny , dave.jiang@intel.com Date: Thu, 16 Jan 2025 22:10:50 -0800 Message-ID: <173709425022.753996.16667967718406367188.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <173709422664.753996.4091585899046900035.stgit@dwillia2-xfh.jf.intel.com> References: <173709422664.753996.4091585899046900035.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 cxl_dpa_alloc() is a hard coded nest of assumptions around PMEM allocations being distinct from RAM allocations in specific ways when in practice the allocation rules are only relative to DPA partition index. The rules for cxl_dpa_alloc() are: - allocations can only come from 1 partition - if allocating at partition-index-N, all free space in partitions less than partition-index-N must be skipped over Use the new 'struct cxl_dpa_partition' array to support allocation with an arbitrary number of DPA partitions on the device. A follow-on patch can go further to cleanup 'enum cxl_decoder_mode' concept and supersede it with looking up the memory properties from partition metadata. Cc: Dave Jiang Cc: Alejandro Lucero Cc: Ira Weiny Signed-off-by: Dan Williams --- drivers/cxl/core/hdm.c | 167 +++++++++++++++++++++++++++++++++--------------- drivers/cxl/cxlmem.h | 9 +++ 2 files changed, 125 insertions(+), 51 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 7e1559b3ed88..4a2816102a1e 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -223,6 +223,30 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds) } EXPORT_SYMBOL_NS_GPL(cxl_dpa_debug, "CXL"); +static void release_skip(struct cxl_dev_state *cxlds, + const resource_size_t skip_base, + const resource_size_t skip_len) +{ + resource_size_t skip_start = skip_base, skip_rem = skip_len; + + for (int i = 0; i < cxlds->nr_partitions; i++) { + const struct resource *part_res = &cxlds->part[i].res; + resource_size_t skip_end, skip_size; + + if (skip_start < part_res->start || skip_start > part_res->end) + continue; + + skip_end = min(part_res->end, skip_start + skip_rem - 1); + skip_size = skip_end - skip_start + 1; + __release_region(&cxlds->dpa_res, skip_start, skip_size); + skip_start += skip_size; + skip_rem -= skip_size; + + if (!skip_rem) + break; + } +} + /* * Must be called in a context that synchronizes against this decoder's * port ->remove() callback (like an endpoint decoder sysfs attribute) @@ -241,7 +265,7 @@ static void __cxl_dpa_release(struct cxl_endpoint_decoder *cxled) skip_start = res->start - cxled->skip; __release_region(&cxlds->dpa_res, res->start, resource_size(res)); if (cxled->skip) - __release_region(&cxlds->dpa_res, skip_start, cxled->skip); + release_skip(cxlds, skip_start, cxled->skip); cxled->skip = 0; cxled->dpa_res = NULL; put_device(&cxled->cxld.dev); @@ -268,6 +292,47 @@ static void devm_cxl_dpa_release(struct cxl_endpoint_decoder *cxled) __cxl_dpa_release(cxled); } +static int request_skip(struct cxl_dev_state *cxlds, + struct cxl_endpoint_decoder *cxled, + const resource_size_t skip_base, + const resource_size_t skip_len) +{ + resource_size_t skip_start = skip_base, skip_rem = skip_len; + + for (int i = 0; i < cxlds->nr_partitions; i++) { + const struct resource *part_res = &cxlds->part[i].res; + struct cxl_port *port = cxled_to_port(cxled); + resource_size_t skip_end, skip_size; + struct resource *res; + + if (skip_start < part_res->start || skip_start > part_res->end) + continue; + + skip_end = min(part_res->end, skip_start + skip_rem - 1); + skip_size = skip_end - skip_start + 1; + + res = __request_region(&cxlds->dpa_res, skip_start, skip_size, + dev_name(&cxled->cxld.dev), 0); + if (!res) { + dev_dbg(cxlds->dev, + "decoder%d.%d: failed to reserve skipped space\n", + port->id, cxled->cxld.id); + break; + } + skip_start += skip_size; + skip_rem -= skip_size; + if (!skip_rem) + break; + } + + if (skip_rem == 0) + return 0; + + release_skip(cxlds, skip_base, skip_len - skip_rem); + + return -EBUSY; +} + static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, resource_size_t base, resource_size_t len, resource_size_t skipped) @@ -277,6 +342,7 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &port->dev; struct resource *res; + int rc; lockdep_assert_held_write(&cxl_dpa_rwsem); @@ -305,14 +371,9 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, } if (skipped) { - res = __request_region(&cxlds->dpa_res, base - skipped, skipped, - dev_name(&cxled->cxld.dev), 0); - if (!res) { - dev_dbg(dev, - "decoder%d.%d: failed to reserve skipped space\n", - port->id, cxled->cxld.id); - return -EBUSY; - } + rc = request_skip(cxlds, cxled, base - skipped, skipped); + if (rc) + return rc; } res = __request_region(&cxlds->dpa_res, base, len, dev_name(&cxled->cxld.dev), 0); @@ -320,16 +381,15 @@ static int __cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled, dev_dbg(dev, "decoder%d.%d: failed to reserve allocation\n", port->id, cxled->cxld.id); if (skipped) - __release_region(&cxlds->dpa_res, base - skipped, - skipped); + release_skip(cxlds, base - skipped, skipped); return -EBUSY; } cxled->dpa_res = res; cxled->skip = skipped; - if (resource_contains(to_pmem_res(cxlds), res)) + if (cxl_partition_contains(cxlds, CXL_PARTITION_PMEM, res)) cxled->mode = CXL_DECODER_PMEM; - else if (resource_contains(to_ram_res(cxlds), res)) + else if (cxl_partition_contains(cxlds, CXL_PARTITION_RAM, res)) cxled->mode = CXL_DECODER_RAM; else { dev_warn(dev, "decoder%d.%d: %pr does not map any partition\n", @@ -527,15 +587,13 @@ int cxl_dpa_set_mode(struct cxl_endpoint_decoder *cxled, int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - resource_size_t free_ram_start, free_pmem_start; struct cxl_port *port = cxled_to_port(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct device *dev = &cxled->cxld.dev; - resource_size_t start, avail, skip; + struct resource *res, *prev = NULL; + resource_size_t start, avail, skip, skip_start; struct resource *p, *last; - const struct resource *ram_res = to_ram_res(cxlds); - const struct resource *pmem_res = to_pmem_res(cxlds); - int rc; + int part, rc; down_write(&cxl_dpa_rwsem); if (cxled->cxld.region) { @@ -551,47 +609,54 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) goto out; } - for (p = ram_res->child, last = NULL; p; p = p->sibling) - last = p; - if (last) - free_ram_start = last->end + 1; + if (cxled->mode == CXL_DECODER_RAM) + part = CXL_PARTITION_RAM; + else if (cxled->mode == CXL_DECODER_PMEM) + part = CXL_PARTITION_PMEM; else - free_ram_start = ram_res->start; + part = cxlds->nr_partitions; + + if (part >= cxlds->nr_partitions) { + dev_dbg(dev, "partition %d not found\n", part); + rc = -EBUSY; + goto out; + } + + res = &cxlds->part[part].res; - for (p = pmem_res->child, last = NULL; p; p = p->sibling) + for (p = res->child, last = NULL; p; p = p->sibling) last = p; if (last) - free_pmem_start = last->end + 1; + start = last->end + 1; else - free_pmem_start = pmem_res->start; + start = res->start; - if (cxled->mode == CXL_DECODER_RAM) { - start = free_ram_start; - avail = ram_res->end - start + 1; - skip = 0; - } else if (cxled->mode == CXL_DECODER_PMEM) { - resource_size_t skip_start, skip_end; - - start = free_pmem_start; - avail = pmem_res->end - start + 1; - skip_start = free_ram_start; - - /* - * If some pmem is already allocated, then that allocation - * already handled the skip. - */ - if (pmem_res->child && - skip_start == pmem_res->child->start) - skip_end = skip_start - 1; - else - skip_end = start - 1; - skip = skip_end - skip_start + 1; - } else { - dev_dbg(dev, "mode not set\n"); - rc = -EINVAL; - goto out; + /* + * To allocate at partition N, a skip needs to be calculated for all + * unallocated space at lower partitions indices. + * + * If a partition has any allocations, the search can end because a + * previous cxl_dpa_alloc() invocation is assumed to have accounted for + * all previous partitions. + */ + skip_start = CXL_RESOURCE_NONE; + for (int i = part; i; i--) { + prev = &cxlds->part[i - 1].res; + for (p = prev->child, last = NULL; p; p = p->sibling) + last = p; + if (last) { + skip_start = last->end + 1; + break; + } + skip_start = prev->start; } + avail = res->end - start + 1; + if (skip_start == CXL_RESOURCE_NONE) + skip = 0; + else + skip = res->start - skip_start; + if (size > avail) { dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, cxl_decoder_mode_name(cxled->mode), &avail); diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 2e728d4b7327..43acd48b300f 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -515,6 +515,15 @@ static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds) return resource_size(res); } +static inline bool cxl_partition_contains(struct cxl_dev_state *cxlds, + unsigned int part, + struct resource *res) +{ + if (part >= cxlds->nr_partitions) + return false; + return resource_contains(&cxlds->part[part].res, res); +} + static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox) { return dev_get_drvdata(cxl_mbox->host);