From patchwork Wed Aug 11 13:52:18 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andre Przywara X-Patchwork-Id: 118821 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter.kernel.org (8.14.4/8.14.3) with ESMTP id o7BDqX4M012381 for ; Wed, 11 Aug 2010 13:52:33 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752816Ab0HKNwb (ORCPT ); Wed, 11 Aug 2010 09:52:31 -0400 Received: from tx2ehsobe002.messaging.microsoft.com ([65.55.88.12]:50222 "EHLO TX2EHSOBE004.bigfish.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752605Ab0HKNwa (ORCPT ); Wed, 11 Aug 2010 09:52:30 -0400 Received: from mail106-tx2-R.bigfish.com (10.9.14.237) by TX2EHSOBE004.bigfish.com (10.9.40.24) with Microsoft SMTP Server id 8.1.340.0; Wed, 11 Aug 2010 13:52:30 +0000 Received: from mail106-tx2 (localhost.localdomain [127.0.0.1]) by mail106-tx2-R.bigfish.com (Postfix) with ESMTP id 603CF19202E7; Wed, 11 Aug 2010 13:52:30 +0000 (UTC) X-SpamScore: -4 X-BigFish: VPS-4(zz936eMzz1202hzzz32i2a8h61h) X-Spam-TCS-SCL: 0:0 Received: from mail106-tx2 (localhost.localdomain [127.0.0.1]) by mail106-tx2 (MessageSwitch) id 1281534749995728_26296; Wed, 11 Aug 2010 13:52:29 +0000 (UTC) Received: from TX2EHSMHS026.bigfish.com (unknown [10.9.14.246]) by mail106-tx2.bigfish.com (Postfix) with ESMTP id D8AF5170804E; Wed, 11 Aug 2010 13:52:29 +0000 (UTC) Received: from ausb3extmailp02.amd.com (163.181.251.22) by TX2EHSMHS026.bigfish.com (10.9.99.126) with Microsoft SMTP Server (TLS) id 14.0.482.44; Wed, 11 Aug 2010 13:52:27 +0000 Received: from ausb3twp02.amd.com ([163.181.250.38]) by ausb3extmailp02.amd.com (Switch-3.2.7/Switch-3.2.7) with SMTP id o7BDqoDt029646; Wed, 11 Aug 2010 08:53:03 -0500 X-WSS-ID: 0L6ZQHW-02-GGJ-02 X-M-MSG: Received: from sausexhtp02.amd.com (sausexhtp02.amd.com [163.181.3.152]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by ausb3twp02.amd.com (Tumbleweed MailGate 3.7.2) with ESMTP id 2EDA6C86CC; Wed, 11 Aug 2010 08:51:31 -0500 (CDT) Received: from storexhtp02.amd.com (172.24.4.4) by sausexhtp02.amd.com (163.181.3.152) with Microsoft SMTP Server (TLS) id 8.2.254.0; Wed, 11 Aug 2010 08:51:36 -0500 Received: from gwo.osrc.amd.com (165.204.16.204) by storexhtp02.amd.com (172.24.4.4) with Microsoft SMTP Server id 8.2.254.0; Wed, 11 Aug 2010 09:51:35 -0400 Received: from localhost.localdomain (tronje.osrc.amd.com [165.204.15.48]) by gwo.osrc.amd.com (Postfix) with ESMTP id 357F749C1A5; Wed, 11 Aug 2010 14:51:35 +0100 (BST) From: Andre Przywara To: , CC: , Andre Przywara Subject: [PATCH 4/4] NUMA: realize NUMA memory pinning Date: Wed, 11 Aug 2010 15:52:18 +0200 Message-ID: <1281534738-8310-5-git-send-email-andre.przywara@amd.com> X-Mailer: git-send-email 1.6.4 In-Reply-To: <1281534738-8310-1-git-send-email-andre.przywara@amd.com> References: <1281534738-8310-1-git-send-email-andre.przywara@amd.com> MIME-Version: 1.0 X-Reverse-DNS: ausb3extmailp02.amd.com Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter.kernel.org [140.211.167.41]); Wed, 11 Aug 2010 13:52:34 +0000 (UTC) diff --git a/hw/pc.c b/hw/pc.c index 1b24409..dbfc082 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -42,6 +42,15 @@ #include "device-assignment.h" #include "kvm.h" +#ifdef CONFIG_NUMA +#include +#include +#ifndef MPOL_F_RELATIVE_NODES + #define MPOL_F_RELATIVE_NODES (1 << 14) + #define MPOL_F_STATIC_NODES (1 << 15) +#endif +#endif + /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -882,6 +891,53 @@ void pc_cpus_init(const char *cpu_model) } } +static void bind_numa(ram_addr_t ram_addr) +{ +#ifdef CONFIG_NUMA + int i; + char* ram_ptr; + ram_addr_t len, ram_offset; + int bind_mode; + + ram_ptr = qemu_get_ram_ptr(ram_addr); + + ram_offset = 0; + for (i = 0; i < nb_numa_nodes; i++) { + len = numa_info[i].guest_mem; + if (numa_info[i].flags != 0) { + switch (numa_info[i].flags & NODE_HOST_POLICY_MASK) { + case NODE_HOST_BIND: + bind_mode = MPOL_BIND; + break; + case NODE_HOST_INTERLEAVE: + bind_mode = MPOL_INTERLEAVE; + break; + case NODE_HOST_PREFERRED: + bind_mode = MPOL_PREFERRED; + break; + default: + bind_mode = MPOL_DEFAULT; + break; + } + bind_mode |= (numa_info[i].flags & NODE_HOST_RELATIVE) ? + MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES; + + /* This is a workaround for a long standing bug in Linux' + * mbind implementation, which cuts off the last specified + * node. To stay compatible should this bug be fixed, we + * specify one more node and zero this one out. + */ + clear_bit(numa_num_configured_nodes() + 1, numa_info[i].host_mem); + if (mbind(ram_ptr + ram_offset, len, bind_mode, + numa_info[i].host_mem, numa_num_configured_nodes() + 1, 0)) + perror("mbind"); + } + ram_offset += len; + } +#endif + return; +} + void pc_memory_init(ram_addr_t ram_size, const char *kernel_filename, const char *kernel_cmdline, @@ -919,6 +975,8 @@ void pc_memory_init(ram_addr_t ram_size, cpu_register_physical_memory(0x100000, below_4g_mem_size - 0x100000, ram_addr + 0x100000); + bind_numa(ram_addr); + #if TARGET_PHYS_ADDR_BITS > 32 cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size, ram_addr + below_4g_mem_size);