From patchwork Sat Sep 1 11:28:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fengguang Wu X-Patchwork-Id: 10584943 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7A10916B1 for ; Sun, 2 Sep 2018 02:21:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A3A829FC7 for ; Sun, 2 Sep 2018 02:21:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5E93E29FCA; Sun, 2 Sep 2018 02:21:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DATE_IN_PAST_12_24, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA76129FC7 for ; Sun, 2 Sep 2018 02:21:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56CF26B5FDA; Sat, 1 Sep 2018 22:21:12 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 51C0A6B5FDB; Sat, 1 Sep 2018 22:21:12 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36FDE6B5FDC; Sat, 1 Sep 2018 22:21:12 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id E340B6B5FDA for ; Sat, 1 Sep 2018 22:21:11 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id g12-v6so8493322plo.1 for ; Sat, 01 Sep 2018 19:21:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:message-id :user-agent:date:from:to:cc:cc:cc:cc:cc:cc:cc:cc:subject:references :mime-version:content-disposition:lines; bh=UH7GdVSbYlnhbsrtH+631I/wzcCxlYfj+6RD5P9A1u0=; b=jHrHs7xxOx2yQ60A9WCk506aOADa/pxt9UTNuWnC4f/K+UhkZJPod/yRxg2ZVSOkR5 Oh7tV6xVK4YzLfeodjCUQbjUtR6Htdl6DnnfCeK4v98zHfprCBWhlOTD647RatScadkY ztUliBdkMJEpxhHU8aWoaK4plWjWHLCtqY+K82xO6qy3kUAd7e7Ea17LfnU8wq+hUYwc A7szgLURjr/a+oSe1rAgBvxxtwFVIdrMpe3c9vdSjZ9F4BLh1pmO6YHhrkqtkDLqSvRv wFmJftZ/57T6ZSsCzWzwr4zQvIuhbc/DqHvEX+FrV7bjegFjNd3D7lawvVrhNGpUHrxJ cRbA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APzg51CPRVbNLNP/dOTgls+E915DU5iyNWSQVFPapLLQTYAfRTjKsPvB 5KE4Tlhj6eo0Lde0u0FoaCwq7FJlHMBn671xGXK3RTwa7rT8bMssPamCU25iMMoscYQ2r74PVzo HlKcGWRtKVRQnoXv8xJGBG1GxNTxfyX+b3TNR33R7ISNwx64vWK1IqBkFASQSijNn8g== X-Received: by 2002:a17:902:1121:: with SMTP id d30-v6mr21608337pla.250.1535854871642; Sat, 01 Sep 2018 19:21:11 -0700 (PDT) X-Google-Smtp-Source: ANB0VdahWUqAxUvcRqlImxKUIxUf0Pb9NZdBbugZPX0vU2yg8vlJzMbc7qYUtDY7N/tTkYsA4Oxz X-Received: by 2002:a17:902:1121:: with SMTP id d30-v6mr21608298pla.250.1535854870587; Sat, 01 Sep 2018 19:21:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535854870; cv=none; d=google.com; s=arc-20160816; b=Z/LGypZKwmcZDctccsVRez+OgdqWqaBbrVxPB8LMMiGD46n3yL1h5V50S85RbJXdRJ OKcPmy1NyVMMKXG+0mqqcpt0uf4yVjLRh9HgPs8g5DhFC7DZssdnUjUei+QrcInNgM4h XDGroEaeZQ0P+CEhLuhmSnmVvCkUSkcJFA/BkXfzqLN+KJQBhqhM8lXx8bgJuPialj86 6e0PE38khnu19s8Z8w11T2he/eWRouBRR2VkFBRQpxbasw7wCoLvgmpTYsVmtOnNXxDe X283j/ars8xQw2BERKu7nIfWVhRb0PSLAjzoWGQ9D9PH+LWAVlq4O11L0fY+8bRQQ8dl mOHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=lines:content-disposition:mime-version:references:subject:cc:cc:cc :cc:cc:cc:cc:cc:to:from:date:user-agent:message-id :arc-authentication-results; bh=UH7GdVSbYlnhbsrtH+631I/wzcCxlYfj+6RD5P9A1u0=; b=hWCOMYBBMCkh3Gcqw5y6Yls+oUHkDCb1PgPmBU37bqXunHV7c/5yYCjPUN9CX0HJ7D jBJKr/B97lrL06qm51Fu7UXBrPtf6pLi7osH78Q7xbrZQxLsUhnEy3fdthMeRJBbspIQ WqRHXk2E4xF1NWUunsB3F7lZLi5Z9pJSBGQlRmeJ/eHxGs2NEOPthWqWgNWmSu/oDwel /kJSHUvvZX4re34Qsaectc/h4hHwhaBvoYPg1PePI/QF9osWcRf3crL8ni6PyzumfJhb 7S29Zl8HQyiTdHkyzz5VxPZFKZXPhv8oXkJaBMHvNSHUFEnFv2tIoP5vCncD4X/YZdPv Nbyw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id d32-v6si13682201pla.93.2018.09.01.19.21.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 01 Sep 2018 19:21:10 -0700 (PDT) Received-SPF: pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.120 as permitted sender) client-ip=192.55.52.120; Authentication-Results: mx.google.com; spf=pass (google.com: domain of fengguang.wu@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=fengguang.wu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Sep 2018 19:21:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,318,1531810800"; d="scan'208";a="80211550" Received: from dbxu-mobl.ccr.corp.intel.com (HELO wfg-t570.sh.intel.com) ([10.254.212.218]) by orsmga003.jf.intel.com with ESMTP; 01 Sep 2018 19:20:58 -0700 Received: from wfg by wfg-t570.sh.intel.com with local (Exim 4.89) (envelope-from ) id 1fwI0X-0003Zi-6a; Sun, 02 Sep 2018 10:20:57 +0800 Message-Id: <20180901124811.591511876@intel.com> User-Agent: quilt/0.63-1 Date: Sat, 01 Sep 2018 19:28:21 +0800 From: Fengguang Wu To: Andrew Morton cc: Linux Memory Management List , Peng DongX , Fengguang Wu cc: Liu Jingqi cc: Dong Eddie CC: Dave Hansen cc: Huang Ying CC: Brendan Gregg cc: kvm@vger.kernel.org Cc: LKML Subject: [RFC][PATCH 3/5] [PATCH 3/5] kvm-ept-idle: HVA indexed EPT read References: <20180901112818.126790961@intel.com> MIME-Version: 1.0 Content-Disposition: inline; filename=0003-kvm-ept-idle-HVA-indexed-EPT-read.patch Lines: 171 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For virtual machines, "accessed" bits will be set in guest page tables and EPT/NPT. So for qemu-kvm process, convert HVA to GFN to GPA, then do EPT/NPT walks. Thanks to the in-memslot linear HVA-GPA mapping, the conversion can be done efficiently, outside of the loops for page table walks. In this manner, we provide uniform interface for both virtual machines and normal processes. The use scenario would be per task/VM working set tracking and migration. Very convenient for applying task/vma and VM granularity policies. Signed-off-by: Peng DongX Signed-off-by: Fengguang Wu --- arch/x86/kvm/ept_idle.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/ept_idle.h | 24 ++++++++++ 2 files changed, 142 insertions(+) create mode 100644 arch/x86/kvm/ept_idle.c create mode 100644 arch/x86/kvm/ept_idle.h diff --git a/arch/x86/kvm/ept_idle.c b/arch/x86/kvm/ept_idle.c new file mode 100644 index 000000000000..5b97dd01011b --- /dev/null +++ b/arch/x86/kvm/ept_idle.c @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include + +#include "ept_idle.h" + + +// mindless copy from kvm_handle_hva_range(). +// TODO: handle order and hole. +static int ept_idle_walk_hva_range(struct ept_idle_ctrl *eic, + unsigned long start, + unsigned long end) +{ + struct kvm_memslots *slots; + struct kvm_memory_slot *memslot; + int ret = 0; + + slots = kvm_memslots(eic->kvm); + kvm_for_each_memslot(memslot, slots) { + unsigned long hva_start, hva_end; + gfn_t gfn_start, gfn_end; + + hva_start = max(start, memslot->userspace_addr); + hva_end = min(end, memslot->userspace_addr + + (memslot->npages << PAGE_SHIFT)); + if (hva_start >= hva_end) + continue; + /* + * {gfn(page) | page intersects with [hva_start, hva_end)} = + * {gfn_start, gfn_start+1, ..., gfn_end-1}. + */ + gfn_start = hva_to_gfn_memslot(hva_start, memslot); + gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot); + + ret = ept_idle_walk_gfn_range(eic, gfn_start, gfn_end); + if (ret) + return ret; + } + + return ret; +} + +static ssize_t ept_idle_read(struct file *file, char *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task = file->private_data; + struct ept_idle_ctrl *eic; + unsigned long hva_start = *ppos << BITMAP_BYTE2PVA_SHIFT; + unsigned long hva_end = hva_start + (count << BITMAP_BYTE2PVA_SHIFT); + int ret; + + if (*ppos % IDLE_BITMAP_CHUNK_SIZE || + count % IDLE_BITMAP_CHUNK_SIZE) + return -EINVAL; + + eic = kzalloc(sizeof(*eic), GFP_KERNEL); + if (!eic) + return -EBUSY; + + eic->buf = buf; + eic->buf_size = count; + eic->kvm = task_kvm(task); + if (!eic->kvm) { + ret = -EINVAL; + goto out_free; + } + + ret = ept_idle_walk_hva_range(eic, hva_start, hva_end); + if (ret) + goto out_free; + + ret = eic->bytes_copied; + *ppos += ret; +out_free: + kfree(eic); + + return ret; +} + +static int ept_idle_open(struct inode *inode, struct file *file) +{ + if (!try_module_get(THIS_MODULE)) + return -EBUSY; + + return 0; +} + +static int ept_idle_release(struct inode *inode, struct file *file) +{ + module_put(THIS_MODULE); + return 0; +} + +extern struct file_operations proc_ept_idle_operations; + +static int ept_idle_entry(void) +{ + proc_ept_idle_operations.owner = THIS_MODULE; + proc_ept_idle_operations.read = ept_idle_read; + proc_ept_idle_operations.open = ept_idle_open; + proc_ept_idle_operations.release = ept_idle_release; + + return 0; +} + +static void ept_idle_exit(void) +{ + memset(&proc_ept_idle_operations, 0, sizeof(proc_ept_idle_operations)); +} + +MODULE_LICENSE("GPL"); +module_init(ept_idle_entry); +module_exit(ept_idle_exit); diff --git a/arch/x86/kvm/ept_idle.h b/arch/x86/kvm/ept_idle.h new file mode 100644 index 000000000000..e0b9dcecf50b --- /dev/null +++ b/arch/x86/kvm/ept_idle.h @@ -0,0 +1,24 @@ +#ifndef _EPT_IDLE_H +#define _EPT_IDLE_H + +#define IDLE_BITMAP_CHUNK_SIZE sizeof(u64) +#define IDLE_BITMAP_CHUNK_BITS (IDLE_BITMAP_CHUNK_SIZE * BITS_PER_BYTE) + +#define BITMAP_BYTE2PVA_SHIFT (3 + PAGE_SHIFT) + +#define EPT_IDLE_KBUF_FULL 1 +#define EPT_IDLE_KBUF_BYTES 8000 +#define EPT_IDLE_KBUF_BITS (EPT_IDLE_KBUF_BYTES * 8) + +struct ept_idle_ctrl { + struct kvm *kvm; + + u64 kbuf[EPT_IDLE_KBUF_BITS / IDLE_BITMAP_CHUNK_BITS]; + int bits_read; + + void __user *buf; + int buf_size; + int bytes_copied; +}; + +#endif