From patchwork Sun Nov 20 01:37:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13049880 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11E8BC4332F for ; Sun, 20 Nov 2022 01:37:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F3EB6B0072; Sat, 19 Nov 2022 20:37:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A4216B0073; Sat, 19 Nov 2022 20:37:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED5F16B0074; Sat, 19 Nov 2022 20:37:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DF9AB6B0072 for ; Sat, 19 Nov 2022 20:37:53 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B424B1C5D93 for ; Sun, 20 Nov 2022 01:37:53 +0000 (UTC) X-FDA: 80152109226.16.1E215E0 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf06.hostedemail.com (Postfix) with ESMTP id E0C83180004 for ; Sun, 20 Nov 2022 01:37:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668908271; x=1700444271; h=subject:from:to:cc:date:message-id:mime-version: content-transfer-encoding; bh=9sVjahp8+Ff3XIBZSoFFjnud1C6/9g5OJkYPdCd/D+M=; b=DrqFPh/U6rMu4DlIG6Kj65uBHWcLeErUVEYPKx/ccN91hxgvLnnmQN+A KnLhZibdWCZAAcTeC3nu/ZlnxlMgg/DFnX7bBmiY58cbYe/EOkKGdC369 ouY0wjkCVadzFpBC4UjlsigEK5k7ONm0ZoNYac9Dk0IWsH231avC8vF1r 56DLIDgnNWRBeY1GUmhg0XYS9MfYyalzDuqxYyRkABDAs3HAi/bWbBq6a xDnKdaCpXPwU1Cyk6JZDlhrYgCVJpzm6gCRmpbXNTJcxFuSeGHEVHwINs D5VOeQ5VyaHiFZKrKV0juqwsl53bIiKvSqT68ce70VGrE3y4A2OJanFjd g==; X-IronPort-AV: E=McAfee;i="6500,9779,10536"; a="399647164" X-IronPort-AV: E=Sophos;i="5.96,178,1665471600"; d="scan'208";a="399647164" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2022 17:37:50 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10536"; a="815321850" X-IronPort-AV: E=Sophos;i="5.96,178,1665471600"; d="scan'208";a="815321850" Received: from thoff-mobl2.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.251.16.82]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2022 17:37:49 -0800 Subject: [PATCH] device-dax: Fix duplicate 'hmem' device registration From: Dan Williams To: nvdimm@lists.linux.dev Cc: Tallam Mahendra Kumar , Mustafa Hajeer , Vishal Verma , linux-mm@kvack.org, linux-cxl@vger.kernel.org Date: Sat, 19 Nov 2022 17:37:49 -0800 Message-ID: <166890823379.4183293.15333502171004313377.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668908272; a=rsa-sha256; cv=none; b=tuWAzNo2rjoRjSjcAFG/EkY6fv/6rguAFoJ1k5TzB2fPwtBcWiv6cSgvPfZ/fwPRWFHpS7 V5jyByAfAy7nDR17iR9PWUfzzxb5emgw1FvRJQtYzMUotSDgiZ+vpDA+L+glkdngqce1wR kLdvAP3GfIMrD9PMs9MsVurQ+L6e+LA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="DrqFPh/U"; spf=pass (imf06.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668908272; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=W+mcFGbH/ZMzeaZKGYY2an2N6glmT2g9hMk/F23luSY=; b=z9bMtR6DCm35nCOw5v/Fw0gYPMpGu6/hcBRILWxkwD3e4J8tFnPKgWttXreP8j/Zjr5hcm +eQvunwlIXD5I1D7ThCpsujs+lQz2QVyHvHZ3nBduz/b6gVX/2SQXRMB9eERLAN5j08PFK aFDx6v8V7OuRD5RZUfavOaeSQMuyQrg= Authentication-Results: imf06.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="DrqFPh/U"; spf=pass (imf06.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E0C83180004 X-Rspam-User: X-Stat-Signature: mw9ebn6yutz94kdbu576hukwxtb4jhzg X-HE-Tag: 1668908271-583278 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: So called "soft-reserved" memory is an EFI conventional memory range with the EFI_MEMORY_SP attribute set. That attribute indicates that the memory is not part of the platform general purpose memory pool and may want some consideration from the system administrator about whether to keep that memory set aside for dedicated access through device-dax (map a device file), or assigned to the page allocator as another general purpose memory node target. Absent an ACPI HMAT table the default device-dax registration creates coarse grained devices that are delineated by EFI Memory Map entries. With the HMAT the devices are delineated by the finer grained ranges associated with the proximity domain of the memory target. I.e. the HMAT describes the properties of performance differentiated memory and each unique performance description results in a unique target proximity domain where each memory proximity domain has an associated SRAT entry that delineates the address range. The intent was that SRAT-defined device-dax instances are registered first. Then any left-over address range with the EFI_MEMORY_SP attribute, but not covered by the SRAT, would have a coarse grained device-dax instance established. However, the scheme to detect what ranges are left to be assigned to a device was buggy and resulted in multiple overlapping device-dax instances. Fix this by using explicit tracking for which ranges have been handled. Now, this new approach may leave memory stranded in the presence of broken platform firmware that fails to fully describe all EFI_MEMORY_SP ranges in the HMAT. That requires a deeper fix if it becomes a problem in practice. Reported-by: "Tallam Mahendra Kumar" Reported-by: Mustafa Hajeer Debugged-by: Vishal Verma Tested-by: Vishal Verma Signed-off-by: Dan Williams --- I plan to take this through the nvdimm tree with some other dax / HMAT related fixups. drivers/dax/hmem/device.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c index 97086fab698e..903325aac991 100644 --- a/drivers/dax/hmem/device.c +++ b/drivers/dax/hmem/device.c @@ -8,6 +8,13 @@ static bool nohmem; module_param_named(disable, nohmem, bool, 0444); +static struct resource hmem_active = { + .name = "HMEM devices", + .start = 0, + .end = -1, + .flags = IORESOURCE_MEM, +}; + void hmem_register_device(int target_nid, struct resource *r) { /* define a clean / non-busy resource for the platform device */ @@ -41,6 +48,12 @@ void hmem_register_device(int target_nid, struct resource *r) goto out_pdev; } + if (!__request_region(&hmem_active, res.start, resource_size(&res), + dev_name(&pdev->dev), 0)) { + dev_dbg(&pdev->dev, "hmem range %pr already active\n", &res); + goto out_active; + } + pdev->dev.numa_node = numa_map_to_online_node(target_nid); info = (struct memregion_info) { .target_node = target_nid, @@ -66,6 +79,8 @@ void hmem_register_device(int target_nid, struct resource *r) return; out_resource: + __release_region(&hmem_active, res.start, resource_size(&res)); +out_active: platform_device_put(pdev); out_pdev: memregion_free(id); @@ -73,15 +88,6 @@ void hmem_register_device(int target_nid, struct resource *r) static __init int hmem_register_one(struct resource *res, void *data) { - /* - * If the resource is not a top-level resource it was already - * assigned to a device by the HMAT parsing. - */ - if (res->parent != &iomem_resource) { - pr_info("HMEM: skip %pr, already claimed\n", res); - return 0; - } - hmem_register_device(phys_to_target_node(res->start), res); return 0;