From patchwork Mon Jan 6 12:06:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bruno Faccini X-Patchwork-Id: 13927240 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03A9AE77188 for ; Mon, 6 Jan 2025 12:08:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7EBFC6B008A; Mon, 6 Jan 2025 07:08:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 79A766B008C; Mon, 6 Jan 2025 07:08:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6141C6B0092; Mon, 6 Jan 2025 07:08:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4161C6B008A for ; Mon, 6 Jan 2025 07:08:00 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E8B6DC1514 for ; Mon, 6 Jan 2025 12:07:59 +0000 (UTC) X-FDA: 82976903478.16.7F4DFBB Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2084.outbound.protection.outlook.com [40.107.220.84]) by imf30.hostedemail.com (Postfix) with ESMTP id 039C680017 for ; Mon, 6 Jan 2025 12:07:56 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=mwVg8Ax+; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); spf=pass (imf30.hostedemail.com: domain of bfaccini@nvidia.com designates 40.107.220.84 as permitted sender) smtp.mailfrom=bfaccini@nvidia.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1736165277; a=rsa-sha256; cv=pass; b=7n4zG7a+xDs129OAhhs0YtPNWhTDYxoe8G7UDupxthLvz/pOsJ8T/2hF6DvmKIT5pxc15J iiabhnERCrPbfQA5s7Cb2ROPznwOoDhpLC1Y2Q3gquZhxXeFnp3f1XORyaGD/NoAL76phP 6zsgTFJKhveF2E2+g1PpUCLgLeY3iMQ= ARC-Authentication-Results: i=2; imf30.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=mwVg8Ax+; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); spf=pass (imf30.hostedemail.com: domain of bfaccini@nvidia.com designates 40.107.220.84 as permitted sender) smtp.mailfrom=bfaccini@nvidia.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736165277; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=imYDZ1eaMRjUhxKK/OO0ypCqczXoQDb/f4wMIY7Le/c=; b=fdVlx1wVQAMoT5CSTxPkWDx27VgXhLFlLHSWyPyGRpOhSxAtLI3t6sGElKvNqYTlvYZDRG UACbWXmIG8mZrm058QlNqYxxbiQzJ3k5NyPQ/+lqt4s/D6Bh86qjiAX0Lh0LV2gcwwixJC nzNLxcpQX+qsddf6Pf2GpqxAP28ZRvc= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=SJvPVHbkpIR8dG3qlbIt5AbGKyWpTO9+C5aJUPCqsUM8I1gtyKgXQlVMTJb2S6hpccbtjB9QQOmodz5CXOcQuWzELHB4MmjuvPPtlnm0N99xkPKlhqfwkDonWFPArQjUtJzB9jUsJz/Ti+Ld57C6vye46ddGDdMEM0N2THQXESQSBvzWI8rYBupUH4F3F3nfh4Al5KGUZrAODI4cBR+TZp9Y5H+rySaQssa0K3r8XcPZvcBmwoFyOzeNpvvuG16TOkAEeTv+QecQamD3QwjbLR6EApPAuSXuwzOPS4XbhdHLEp5Fo+UyufqoHqjS3Pr48qo3NVlFT9ie+SB/A0WsNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=imYDZ1eaMRjUhxKK/OO0ypCqczXoQDb/f4wMIY7Le/c=; b=EFHvLmc6DKxzzdOJLvwgR0oLgIWZWIL93vQO/UFgFH0K08lM7lVoDRGXu7S3mYjzOrKMpXuWBOyghL1IcmdC5QbOij20zDBpt6zqZ8H8QLDDhW9Lm8eQPdnqBOoqzTJtZXOLauKplKpY90wIrGeRbBOAtwN5FpE3di3LrxRlvUJ2i8qJwT0icN4wZCIJI9p2iCe2K0xp4vUegFbn4JwiwQpOg8w/7Qlq7NSojRgdJfAA699C6Lw22Efk+PskIkzoH9Js5+BXpOHnLLPvOv8wScvd6NXjlqBEu7QKyj94DkD9KiVi74sL+FkQKgLafD7K/d6unRm1/0cvh/BkvFZIqQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=imYDZ1eaMRjUhxKK/OO0ypCqczXoQDb/f4wMIY7Le/c=; b=mwVg8Ax+aMDXMcGQ2SSRBPb7XFALJapXpUJ2hW5R9lna8kYr29zYHTjXmrhIT9WrurgmmxTKR16hqyB5KqzQ3rTzuKKq4VLfu1t4RkvkIzoAZfQKuVEWEfN0JclLTbhmZ34jn26KtpyeoJCXE12YND+1G8zNFm7/gEvj2tfttVeZgyWzMfmeh3ZjNakJkA9l12lLX+QxUM6uQk24pTQa6HzJA178ypeuOZV0mpb0xlmWxeVcMu8y+Ne18NeFnwEMnDe4NR7A/Y1XdfTmIRyeiA5ZFM6e6bq8z6rpNsrUUi5eyh49YQK2P5B58BIIjuaC0GgRZRwpSYyeNq9Q8BlnLQ== Received: from MW4PR12MB7261.namprd12.prod.outlook.com (2603:10b6:303:229::22) by IA1PR12MB8312.namprd12.prod.outlook.com (2603:10b6:208:3fc::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8314.17; Mon, 6 Jan 2025 12:07:50 +0000 Received: from MW4PR12MB7261.namprd12.prod.outlook.com ([fe80::d231:4655:2e3d:af1b]) by MW4PR12MB7261.namprd12.prod.outlook.com ([fe80::d231:4655:2e3d:af1b%4]) with mapi id 15.20.8314.015; Mon, 6 Jan 2025 12:07:50 +0000 From: Bruno Faccini To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, akpm@linux-foundation.org, rppt@kernel.org, david@redhat.com, ziy@nvidia.com, jhubbard@nvidia.com, mrusiniak@nvidia.com, Bruno Faccini Subject: [PATCH 1/1] mm/fake-numa: allow later numa node hotplug Date: Mon, 6 Jan 2025 04:06:59 -0800 Message-ID: <20250106120659.359610-2-bfaccini@nvidia.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250106120659.359610-1-bfaccini@nvidia.com> References: <20250106120659.359610-1-bfaccini@nvidia.com> X-ClientProxiedBy: SJ0PR05CA0171.namprd05.prod.outlook.com (2603:10b6:a03:339::26) To MW4PR12MB7261.namprd12.prod.outlook.com (2603:10b6:303:229::22) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MW4PR12MB7261:EE_|IA1PR12MB8312:EE_ X-MS-Office365-Filtering-Correlation-Id: 3a6d2801-2baa-4c63-d53a-08dd2e4abdce X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|376014|1800799024; X-Microsoft-Antispam-Message-Info: Xk3YkE0XAtHFVnHUcR9zEMryV/hJDRrps8/VicwEMeQXYEGFnkn9Xf06UiQT6vV6FmBdp4ltqEm/6IqME5hvi+vHugJ9BmFNnQgXN/fIyCzL2TobtBsRqseOKmfNcbvKyFFTfh10TuI2nS8JFb+PdZvhG5BJMXS4eJ4uY0cEIUXO+OkUWrUOUPCLOaIWUoQUTs1TBZn0Zma+HS+JmNClaX3S1UbWKYEqwPoAc7lasaS1zG0Dh88ihTfcmpy17WYm/YU0qbU2jdbyFZwtz+yq/pAJMoGCAP7Zur0j5DNCi8mXbHHAzoovTMmGKn8AHIweuqX4hdB/Ywu/wWxF0dCXdGi7REBXEW8TGFqxKoNbkS1Sdsi2W0pvkcqW9tq/O11x7JD3XG4p4ZXxugaD/ybJZlgGhUAOQDIeK58inFX/KjTN5gG9RuV0I9NzCoDoxiY7Z7flQthFIoWLLHPoWclLQ7ZoopBaXdYEomNSgc91MDWTPtqO9sjSEsUx5ptvd9tz6dvemfLzNF+j1+2Rgzyc2kj+vZ2aEukuEdrylYLMNrVsgyJkZ8hbCjJWcgfrpgFAZK7EXT7olgU6mtw5KYYWGKwVh6e0YOq79IYJP3SJM7vh/LAJEiM7/Jbi1Mw7BFjLuqUen5DiIaEldwbLzvZiIxI/fmS+gpVU2uLlaejZ4yvsEWCWy44kuzy8yDQY3HiB95p1BiXllh7zfGC/bjgAYa1gH7x2TjJBx3EzKCjnbFYkd56iLfMqc0ATf0X8zMcH9oNXzB1kjneRPnPL0g1jIsgxQ/R1lzuRFlFYmk6y/aJ1e1zS5oXm2DhsVPhGrlI6t5byV4dXowZJmi37t86DrrftQ6kQnN7d2RYwZDfO5aOZbHSwUwmn8xklFwveOK4rKFN61eWnPgE2dhI5aR1PZctNseYuSE8F0l5l9DNS2eZg/UlhuXnJOG5Wz/vkJLlFBEkykZhNAvMEq7AwtdBxXYLGYhWdY7NVuvyhtVBTs2zbOPr/cDLhGcyNsddWbocv2k/KOmkI7++kimAHD8iaehm5UytCjkYA5+s1C1AusUWAQWx1MZR9cun8cpOY2bMV4RLcglQeqNJDkJxKtCmuy9CCH0qZRdyxICBz3goqL+VnwdEW1oeFjyx+7n3fsif7aCZLPJdHpjeul5yTNyVz3zYpw1/lbgI6jpX2bc7dz112/EwoP+AKoiOKpoa0Att3Rjlxy40j3tXvldeljvlrxKskkIR7HW58dL0p9mqJc029O5eK8aqBzpNiqFdifJ+DixfCWxiHQQe7xM4UoARE5SdI5PKvCnVbsqocXDy2uzJ1KGDCsL4ntPdp+bFyiMbse6R33sscvqh6Y1M3n5yiHjBF7jrhStQoAuExyZvg1q2afuOR0UErk5VsfKmbKrfb X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MW4PR12MB7261.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(376014)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: PCa0M9uEG81mLZs3qG/viSQKJ5iZCaG7ebbwkbqsNxNvVFynh78YSBzCl17A4irL6nwyLG5x/cmudcndia/VM/hb6qFPcTC89HrSohpj1uDuqi77NbO2shRTIA9lqu7qloe3UFCZD+ezUyXkMZXMjN4ENm5mrcsjlIDgTPIMZRGd7rx7UXWVWEzyfIioILolvt0rmUlOGF0ONWnZO2reT93Q/BQt+oFX+T1X2d/2kD3lMQ1TPWt54YC7OHnn9Y6Fmd5DPscStYHeDAt0Kno1HVBd2tLVfaIkcvYVc4VWfR2DET54kcQx3OcNXxKPDxUgRX8cWNZgPYApO5cF+zLMrfBlgw5PEh4qDrshkpQrlGkt1ZV962k/etZAMca1w5Fp3xHIjzRO1klq6/+4eR28WuPhbUvVq7CwrIHAgEyFAGnvlZ5+ZkGRC3Ee5nGGrI2HFFJlt7/xb7iYpnHJTMnuzG2rqupxS1HSHUA8+YvupafHMW5NTwy7p67I54ZfX9K89ZeLnDUFJlfOPkl66V5aPO22C1+okctmfm9BLaudpyrIwty0G0KA/2Ome4XGsBNGy8w2FzMZafhQtOVOkGGrJ10Bs/kc+aYHxgLZQaXibidh8ePc/d4x+NA8N9BCgGCEi8Sae0/dPWQnh3ibFJ/5rSmtvEQsFdX4kZvAn7DHikUW6syR777cxwNiR7Kn46GApR2lBOIiidRyA8iBJ5YfXtAuhNHHb/qOW8tMIjUd85UNeZtGeacXNi3Mqvw9TjEUWkr/bk5VrMKf+To+RKjtMjSnIx3KagqrbBbRl97qV1m7olP0FBaSBO57cTHasx7qOTvMSN8Ub8Irga9w5VcAJN3CVVBhNvl2qE5YFaBxhfsbDU0rrKuwWqtetRPrRQDE+t4rsXU5X3/vkdP/lEAEDu+RSmsY9gSLsIXS+DKZg15hdhrBftsGBB6mrM11O14nQjZ5/vMT1rzB9rrliCO8B2Hj//x6eHcjA+9USBz/m8qMvKdZsRRLXpH7G6my2zLm8iZ+qSR4uuOHVF7xKlw3TeQD+ihczltt9gc17vvHO4JvYbYh0mRU8WsOrmcAPyzSUmByCRiKvln5xPltZtTtaDx6A8OOsqjHwXFx/BedCbO2gizYfhF9zf1yRTcyH7mVLP82GfVF1+JfflK/dd3Ri2jZx2S859ZQpNzYM9vBngC4ujUmr9ASx9FtGMXrc5LLXCy4ZF0SyTustalLOJKP6ptMvzS16mnYM+YgmOA6wzqeEqdN+jOHOSeYEXrrrCJK8a6QLOjIXp5/cS6L0PGQViZjsxGKJEglL+2QtmrDlNsIlU2vAH15qiCFxJy195p7hLR54e7SxHp50xCg+fue2hQZAp2nvT1DeQPogOamjR8MB7d8q/366A9It0fL32T0vKWjcrbhs128pJeBF5dfQknre9k3tZ5jc+J4h4bxTMoUDWns8vUDY9jrqBKOAIFGE8wcSNcs2HvusVBNsGb9bVm09Q4mRBxF6l8kgK8nTFwj9vPaTNviuCH71jaYLUeR7gZZTcAPOFa2fNeJmXCEYfPqWLgQ5HzSIminVfcKOhfMblBhsctUM5R72qThg8bZ X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3a6d2801-2baa-4c63-d53a-08dd2e4abdce X-MS-Exchange-CrossTenant-AuthSource: MW4PR12MB7261.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Jan 2025 12:07:50.4410 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 13AzfZV8LnTYyv6ktb+IvqrD70p47kwiB9HS9zZoMAotLlpRiTsONDLx1MBbVNRawDEsw+zoS9OXm31Coawz5Q== X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB8312 X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 039C680017 X-Stat-Signature: 9zdndnckrpwaurp7rao3ghzjc8cqxihb X-HE-Tag: 1736165276-38367 X-HE-Meta: U2FsdGVkX1/fgb1PfAQ6qCbYQ2TYjZ85V8P/xOayfR7L1fl1qgB0DJP/rfqHIR7NZFwRXcwvTwAESu98/bIHPNeEypm4e/SyYLWv/ASReLMGsqOLCF5wPzWHMG7Njc9f1UsppFOv4scFzBMLm0np/xUHYXn0DTaQcatN178kVN1fKvlm1VowX5C1vDF6ym5JHLsNV5XPM/lNoKrHJtW6W4W3cgtagWGXx3fMZnKMiuXPwKRj0zuSvAHWIEhMbxds9ytMh58ic/GSE7jnxfiorjlR4eAlbQZPbeHOXQIQdkPU5cIeMhsvKqeWUNRi26JDOYtJN34E4AMmlhhO+rt96LLHrD4nmQBy5U5bwPCvOMdwngKnxtRdcQrbPFu+MinqJy9PJDatmL1UWbGdBm0qo6diD3ypA+wXF/0rEgApeZp3KNFs6C6Ih9PpnPcooCeM7jCJSRuxbIJ5svEdo46vPB+wLbEvbETIr48MkJ1HKxP7O7fNu2FLFANfELjOPsvd2931foy4DQJGWACuGvbEZ+MYyUyYyWYkwwkvmeioDjz3wuBALsBVsjWNBiKK8agpvuPJmpxbOU9kAl2acvIZwRDVaSwok8yQeln9dk2/vv2JckAD1bWdW4D8R1U/t1RJkScM5UzgJx3NOBoxftaPuoDPHUqLPMWuQRnFguOYsl2J5oerMTtOSs6lyuCmhY/E+/p7Sxhmw7MMf716p7DZBkL9Kgr4L54lCFPzla8M634fWy5UeyhB7JfNTTero85lyIkGcq/nPhKjOMmS8uffNP6bVkEzaxQQah9KL39Gnyyg83hOQMnAfgzIAh+u/co51TO80WefIuZfYYOPUJPOGeaGmckUWxuK86LnQFHtu5bZY9pXaCR0nRqu5obASxQcG87frghx/XkxIEuEdh2cDSZTmd2IJG4DIoO7/SpoHiPG6II8sLau+3Pnu523K2Txed752LKYy9cYb0OrDjm olqthNJ0 F66SrMq2GXtMAK9beMkk6KsXa6NF4CGxVeLe/rCvF6ZPJk72nWuewRndb/Fuw3GJIaP1J49dR3Mhz0gI66EScGmkySqnIp4GVJZsZPHEJBzM2PMcqFVgUuqMJu0HjvXS2mKueBtaAAvysDP39v2XzjL5Lf/yv/Pf+zeYpAxzR0svL6/JpM+h0BDvPBLPDYOi0+VzpOWwVvmAJmAlyhGW3TDQBbToB7hqID/nZFLddtJhOS8D33eLoBBWH3Hi6xK1JzBfAifF41hJTZR0gwNBKzwbnApHjZZX6TsIibwR64TB32Yd0qO0QGHV8ekVCI3+PXi5ax0qH+UvuGHo3VVJvHrBoZw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Current fake-numa implementation prevents new Numa nodes to be later hot-plugged by drivers. A common symptom of this limitation is the "node was absent from the node_possible_map" message by associated warning in mm/memory_hotplug.c: add_memory_resource(). This comes from the lack of remapping in both pxm_to_node_map[] and node_to_pxm_map[] tables to take fake-numa nodes into account and thus triggers collisions with original and physical nodes only-mapping that had been determined from BIOS tables. This patch fixes this by doing the necessary node-ids translation in both pxm_to_node_map[]/node_to_pxm_map[] tables. node_distance[] table has also been fixed accordingly. Signed-off-by: Bruno Faccini --- drivers/acpi/numa/srat.c | 86 ++++++++++++++++++++++++++++++++++++ include/acpi/acpi_numa.h | 5 +++ include/linux/numa_memblks.h | 3 ++ mm/numa_emulation.c | 45 ++++++++++++++++--- mm/numa_memblks.c | 2 +- 5 files changed, 133 insertions(+), 8 deletions(-) diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c index bec0dcd1f9c3..59fffe34c9d0 100644 --- a/drivers/acpi/numa/srat.c +++ b/drivers/acpi/numa/srat.c @@ -81,6 +81,92 @@ int acpi_map_pxm_to_node(int pxm) } EXPORT_SYMBOL(acpi_map_pxm_to_node); +#ifdef CONFIG_NUMA_EMU +/* + * Take max_nid - 1 fake-numa nodes into account in both + * pxm_to_node_map()/node_to_pxm_map[] tables. + */ +int __init fix_pxm_node_maps(int max_nid) +{ + static int pxm_to_node_map_copy[MAX_PXM_DOMAINS] __initdata + = { [0 ... MAX_PXM_DOMAINS - 1] = NUMA_NO_NODE }; + static int node_to_pxm_map_copy[MAX_NUMNODES] __initdata + = { [0 ... MAX_NUMNODES - 1] = PXM_INVAL }; + int i, j, index = -1, count = 0; + nodemask_t nodes_to_enable; + + if (numa_off || srat_disabled()) + return -1; + + /* find fake nodes PXM mapping */ + for (i = 0; i < MAX_NUMNODES; i++) { + if (node_to_pxm_map[i] != PXM_INVAL) { + for (j = 0; j <= max_nid; j++) { + if ((emu_nid_to_phys[j] == i) && + WARN(node_to_pxm_map_copy[j] != PXM_INVAL, + "Node %d is already binded to PXM %d\n", + j, node_to_pxm_map_copy[j])) + return -1; + if (emu_nid_to_phys[j] == i) { + node_to_pxm_map_copy[j] = + node_to_pxm_map[i]; + if (j > index) + index = j; + count++; + } + } + } + } + if (WARN(index != max_nid, "%d max nid when expected %d\n", + index, max_nid)) + return -1; + + nodes_clear(nodes_to_enable); + + /* map phys nodes not used for fake nodes */ + for (i = 0; i < MAX_NUMNODES; i++) { + if (node_to_pxm_map[i] != PXM_INVAL) { + for (j = 0; j <= max_nid; j++) + if (emu_nid_to_phys[j] == i) + break; + /* fake nodes PXM mapping has been done */ + if (j <= max_nid) + continue; + /* find first hole */ + for (j = 0; + j < MAX_NUMNODES && + node_to_pxm_map_copy[j] != PXM_INVAL; + j++) + ; + if (WARN(j == MAX_NUMNODES, + "Number of nodes exceeds MAX_NUMNODES\n")) + return -1; + node_to_pxm_map_copy[j] = node_to_pxm_map[i]; + node_set(j, nodes_to_enable); + count++; + } + } + + /* creating reverse mapping in pxm_to_node_map[] */ + for (i = 0; i < MAX_NUMNODES; i++) + if (node_to_pxm_map_copy[i] != PXM_INVAL && + pxm_to_node_map_copy[node_to_pxm_map_copy[i]] == NUMA_NO_NODE) + pxm_to_node_map_copy[node_to_pxm_map_copy[i]] = i; + + /* overwrite with new mapping */ + for (i = 0; i < MAX_NUMNODES; i++) { + node_to_pxm_map[i] = node_to_pxm_map_copy[i]; + pxm_to_node_map[i] = pxm_to_node_map_copy[i]; + } + + /* enable other nodes found in PXM for hotplug */ + nodes_or(numa_nodes_parsed, nodes_to_enable, numa_nodes_parsed); + + pr_debug("found %d total number of nodes\n", count); + return 0; +} +#endif + static void __init acpi_table_print_srat_entry(struct acpi_subtable_header *header) { diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h index b5f594754a9e..99b960bd473c 100644 --- a/include/acpi/acpi_numa.h +++ b/include/acpi/acpi_numa.h @@ -17,11 +17,16 @@ extern int node_to_pxm(int); extern int acpi_map_pxm_to_node(int); extern unsigned char acpi_srat_revision; extern void disable_srat(void); +extern int fix_pxm_node_maps(int max_nid); extern void bad_srat(void); extern int srat_disabled(void); #else /* CONFIG_ACPI_NUMA */ +static inline int fix_pxm_node_maps(int max_nid) +{ + return 0; +} static inline void disable_srat(void) { } diff --git a/include/linux/numa_memblks.h b/include/linux/numa_memblks.h index cfad6ce7e1bd..dd85613cdd86 100644 --- a/include/linux/numa_memblks.h +++ b/include/linux/numa_memblks.h @@ -29,7 +29,10 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi); int __init numa_memblks_init(int (*init_func)(void), bool memblock_force_top_down); +extern int numa_distance_cnt; + #ifdef CONFIG_NUMA_EMU +extern int emu_nid_to_phys[MAX_NUMNODES]; int numa_emu_cmdline(char *str); void __init numa_emu_update_cpu_to_node(int *emu_nid_to_phys, unsigned int nr_emu_nids); diff --git a/mm/numa_emulation.c b/mm/numa_emulation.c index 031fb9961bf7..9d55679d99ce 100644 --- a/mm/numa_emulation.c +++ b/mm/numa_emulation.c @@ -8,11 +8,12 @@ #include #include #include +#include #define FAKE_NODE_MIN_SIZE ((u64)32 << 20) #define FAKE_NODE_MIN_HASH_MASK (~(FAKE_NODE_MIN_SIZE - 1UL)) -static int emu_nid_to_phys[MAX_NUMNODES]; +int emu_nid_to_phys[MAX_NUMNODES]; static char *emu_cmdline __initdata; int __init numa_emu_cmdline(char *str) @@ -379,6 +380,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) size_t phys_size = numa_dist_cnt * numa_dist_cnt * sizeof(phys_dist[0]); int max_emu_nid, dfl_phys_nid; int i, j, ret; + nodemask_t physnode_mask = numa_nodes_parsed; if (!emu_cmdline) goto no_emu; @@ -395,7 +397,6 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) * split the system RAM into N fake nodes. */ if (strchr(emu_cmdline, 'U')) { - nodemask_t physnode_mask = numa_nodes_parsed; unsigned long n; int nid = 0; @@ -465,9 +466,6 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) */ max_emu_nid = setup_emu2phys_nid(&dfl_phys_nid); - /* commit */ - *numa_meminfo = ei; - /* Make sure numa_nodes_parsed only contains emulated nodes */ nodes_clear(numa_nodes_parsed); for (i = 0; i < ARRAY_SIZE(ei.blk); i++) @@ -475,10 +473,21 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) ei.blk[i].nid != NUMA_NO_NODE) node_set(ei.blk[i].nid, numa_nodes_parsed); - numa_emu_update_cpu_to_node(emu_nid_to_phys, ARRAY_SIZE(emu_nid_to_phys)); + /* fix pxm_to_node_map[] and node_to_pxm_map[] to avoid collision + * with faked numa nodes, particularly during later memory hotplug + * handling, and also update numa_nodes_parsed accordingly. + */ + ret = fix_pxm_node_maps(max_emu_nid); + if (ret < 0) + goto no_emu; + + /* commit */ + *numa_meminfo = ei; + + numa_emu_update_cpu_to_node(emu_nid_to_phys, max_emu_nid + 1); /* make sure all emulated nodes are mapped to a physical node */ - for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++) + for (i = 0; i < max_emu_nid + 1; i++) if (emu_nid_to_phys[i] == NUMA_NO_NODE) emu_nid_to_phys[i] = dfl_phys_nid; @@ -501,12 +510,34 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) numa_set_distance(i, j, dist); } } + for (i = 0; i < numa_distance_cnt; i++) { + for (j = 0; j < numa_distance_cnt; j++) { + int physi, physj; + u8 dist; + + /* distance between fake nodes is already ok */ + if (emu_nid_to_phys[i] != NUMA_NO_NODE && + emu_nid_to_phys[j] != NUMA_NO_NODE) + continue; + if (emu_nid_to_phys[i] != NUMA_NO_NODE) + physi = emu_nid_to_phys[i]; + else + physi = i - max_emu_nid; + if (emu_nid_to_phys[j] != NUMA_NO_NODE) + physj = emu_nid_to_phys[j]; + else + physj = j - max_emu_nid; + dist = phys_dist[physi * numa_dist_cnt + physj]; + numa_set_distance(i, j, dist); + } + } /* free the copied physical distance table */ memblock_free(phys_dist, phys_size); return; no_emu: + numa_nodes_parsed = physnode_mask; /* No emulation. Build identity emu_nid_to_phys[] for numa_add_cpu() */ for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++) emu_nid_to_phys[i] = i; diff --git a/mm/numa_memblks.c b/mm/numa_memblks.c index a3877e9bc878..ff4054f4334d 100644 --- a/mm/numa_memblks.c +++ b/mm/numa_memblks.c @@ -7,7 +7,7 @@ #include #include -static int numa_distance_cnt; +int numa_distance_cnt; static u8 *numa_distance; nodemask_t numa_nodes_parsed __initdata;