From patchwork Fri Dec 3 01:39:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Hyman Huang X-Patchwork-Id: 12654007 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 12F55C433F5 for ; Fri, 3 Dec 2021 01:44:20 +0000 (UTC) Received: from localhost ([::1]:53440 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1msxcW-0001QI-VP for qemu-devel@archiver.kernel.org; Thu, 02 Dec 2021 20:44:16 -0500 Received: from eggs.gnu.org ([209.51.188.92]:53024) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1msxYZ-0004no-Ol for qemu-devel@nongnu.org; Thu, 02 Dec 2021 20:40:11 -0500 Received: from prt-mail.chinatelecom.cn ([42.123.76.219]:59839 helo=chinatelecom.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1msxYX-0006BY-6z for qemu-devel@nongnu.org; Thu, 02 Dec 2021 20:40:11 -0500 HMM_SOURCE_IP: 172.18.0.48:50602.1497958364 HMM_ATTACHE_NUM: 0000 HMM_SOURCE_TYPE: SMTP Received: from clientip-182.150.57.243 (unknown [172.18.0.48]) by chinatelecom.cn (HERMES) with SMTP id 6EFB32800B6; Fri, 3 Dec 2021 09:40:00 +0800 (CST) X-189-SAVE-TO-SEND: +huangy81@chinatelecom.cn Received: from ([172.18.0.48]) by app0024 with ESMTP id b696edf56e2e4906ac1984e4a184bde3 for qemu-devel@nongnu.org; Fri, 03 Dec 2021 09:40:07 CST X-Transaction-ID: b696edf56e2e4906ac1984e4a184bde3 X-Real-From: huangy81@chinatelecom.cn X-Receive-IP: 172.18.0.48 X-MEDUSA-Status: 0 From: huangy81@chinatelecom.cn To: qemu-devel Subject: [PATCH v9 1/3] migration/dirtyrate: implement vCPU dirtyrate calculation periodically Date: Fri, 3 Dec 2021 09:39:45 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=42.123.76.219; envelope-from=huangy81@chinatelecom.cn; helo=chinatelecom.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: David Hildenbrand , Hyman , Juan Quintela , Richard Henderson , Markus ArmBruster , Peter Xu , "Dr. David Alan Gilbert" , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Hyman Huang(黄勇) Introduce the third method GLOBAL_DIRTY_LIMIT of dirty tracking for calculate dirtyrate periodly for dirty restraint. Implement thread for calculate dirtyrate periodly, which will be used for dirty restraint. Add dirtylimit.h to introduce the util function for dirty limit implementation. Signed-off-by: Hyman Huang(黄勇) --- include/exec/memory.h | 5 +- include/sysemu/dirtylimit.h | 44 ++++++++++++++ migration/dirtyrate.c | 142 ++++++++++++++++++++++++++++++++++++++++---- migration/dirtyrate.h | 2 + 4 files changed, 182 insertions(+), 11 deletions(-) create mode 100644 include/sysemu/dirtylimit.h diff --git a/include/exec/memory.h b/include/exec/memory.h index 20f1b27..606bec8 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr, /* Dirty tracking enabled because measuring dirty rate */ #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1) -#define GLOBAL_DIRTY_MASK (0x3) +/* Dirty tracking enabled because dirty limit */ +#define GLOBAL_DIRTY_LIMIT (1U << 2) + +#define GLOBAL_DIRTY_MASK (0x7) extern unsigned int global_dirty_tracking; diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h new file mode 100644 index 0000000..49298a2 --- /dev/null +++ b/include/sysemu/dirtylimit.h @@ -0,0 +1,44 @@ +/* + * dirty limit helper functions + * + * Copyright (c) 2021 CHINA TELECOM CO.,LTD. + * + * Authors: + * Hyman Huang(黄勇) + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ +#ifndef QEMU_DIRTYRLIMIT_H +#define QEMU_DIRTYRLIMIT_H + +#define DIRTYLIMIT_CALC_PERIOD_TIME_S 15 /* 15s */ + +/** + * dirtylimit_calc_current: + * + * get current dirty page rate for specified vCPU. + */ +int64_t dirtylimit_calc_current(int cpu_index); + +/** + * dirtylimit_calc: + * + * start dirty page rate calculation thread. + */ +void dirtylimit_calc(void); + +/** + * dirtylimit_calc_quit: + * + * quit dirty page rate calculation thread. + */ +void dirtylimit_calc_quit(void); + +/** + * dirtylimit_calc_state_init: + * + * initialize dirty page rate calculation state. + */ +void dirtylimit_calc_state_init(int max_cpus); +#endif diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c index d65e744..5b969be 100644 --- a/migration/dirtyrate.c +++ b/migration/dirtyrate.c @@ -27,6 +27,7 @@ #include "qapi/qmp/qdict.h" #include "sysemu/kvm.h" #include "sysemu/runstate.h" +#include "sysemu/dirtylimit.h" #include "exec/memory.h" /* @@ -46,6 +47,137 @@ static struct DirtyRateStat DirtyStat; static DirtyRateMeasureMode dirtyrate_mode = DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING; +#define DIRTYLIMIT_CALC_TIME_MS 1000 /* 1000ms */ + +struct { + DirtyRatesData data; + int64_t period; + bool quit; +} *dirtylimit_calc_state; + +static void dirtylimit_global_dirty_log_start(void) +{ + qemu_mutex_lock_iothread(); + memory_global_dirty_log_start(GLOBAL_DIRTY_LIMIT); + qemu_mutex_unlock_iothread(); +} + +static void dirtylimit_global_dirty_log_stop(void) +{ + qemu_mutex_lock_iothread(); + memory_global_dirty_log_stop(GLOBAL_DIRTY_LIMIT); + qemu_mutex_unlock_iothread(); +} + +static inline void record_dirtypages(DirtyPageRecord *dirty_pages, + CPUState *cpu, bool start) +{ + if (start) { + dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages; + } else { + dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages; + } +} + +static void dirtylimit_calc_func(void) +{ + CPUState *cpu; + DirtyPageRecord *dirty_pages; + int64_t start_time, end_time, calc_time; + DirtyRateVcpu rate; + int i = 0; + + dirty_pages = g_malloc0(sizeof(*dirty_pages) * + dirtylimit_calc_state->data.nvcpu); + + CPU_FOREACH(cpu) { + record_dirtypages(dirty_pages, cpu, true); + } + + start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + g_usleep(DIRTYLIMIT_CALC_TIME_MS * 1000); + end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + calc_time = end_time - start_time; + + qemu_mutex_lock_iothread(); + memory_global_dirty_log_sync(); + qemu_mutex_unlock_iothread(); + + CPU_FOREACH(cpu) { + record_dirtypages(dirty_pages, cpu, false); + } + + for (i = 0; i < dirtylimit_calc_state->data.nvcpu; i++) { + uint64_t increased_dirty_pages = + dirty_pages[i].end_pages - dirty_pages[i].start_pages; + uint64_t memory_size_MB = + (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20; + int64_t dirtyrate = (memory_size_MB * 1000) / calc_time; + + rate.id = i; + rate.dirty_rate = dirtyrate; + dirtylimit_calc_state->data.rates[i] = rate; + + trace_dirtyrate_do_calculate_vcpu(i, + dirtylimit_calc_state->data.rates[i].dirty_rate); + } +} + +static void *dirtylimit_calc_thread(void *opaque) +{ + rcu_register_thread(); + + dirtylimit_global_dirty_log_start(); + + while (!qatomic_read(&dirtylimit_calc_state->quit)) { + dirtylimit_calc_func(); + sleep(dirtylimit_calc_state->period); + } + + dirtylimit_global_dirty_log_stop(); + + rcu_unregister_thread(); + return NULL; +} + +int64_t dirtylimit_calc_current(int cpu_index) +{ + DirtyRateVcpu *rates = dirtylimit_calc_state->data.rates; + + return qatomic_read(&rates[cpu_index].dirty_rate); +} + +void dirtylimit_calc(void) +{ + if (unlikely(qatomic_read(&dirtylimit_calc_state->quit))) { + qatomic_set(&dirtylimit_calc_state->quit, 0); + QemuThread thread; + qemu_thread_create(&thread, "dirtylimit-calc", + dirtylimit_calc_thread, + NULL, QEMU_THREAD_DETACHED); + } +} + +void dirtylimit_calc_quit(void) +{ + qatomic_set(&dirtylimit_calc_state->quit, 1); +} + +void dirtylimit_calc_state_init(int max_cpus) +{ + dirtylimit_calc_state = + g_malloc0(sizeof(*dirtylimit_calc_state)); + + dirtylimit_calc_state->data.nvcpu = max_cpus; + dirtylimit_calc_state->data.rates = + g_malloc0(sizeof(DirtyRateVcpu) * max_cpus); + + dirtylimit_calc_state->period = + DIRTYLIMIT_CALC_PERIOD_TIME_S; + + dirtylimit_calc_state->quit = true; +} + static int64_t set_sample_page_period(int64_t msec, int64_t initial_time) { int64_t current_time; @@ -396,16 +528,6 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info, return true; } -static inline void record_dirtypages(DirtyPageRecord *dirty_pages, - CPUState *cpu, bool start) -{ - if (start) { - dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages; - } else { - dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages; - } -} - static void dirtyrate_global_dirty_log_start(void) { qemu_mutex_lock_iothread(); diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h index 69d4c5b..e96acdc 100644 --- a/migration/dirtyrate.h +++ b/migration/dirtyrate.h @@ -70,6 +70,8 @@ typedef struct VcpuStat { DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */ } VcpuStat; +typedef struct VcpuStat DirtyRatesData; + /* * Store calculation statistics for each measure. */ From patchwork Fri Dec 3 01:39:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Hyman Huang X-Patchwork-Id: 12653999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B4EF7C433F5 for ; Fri, 3 Dec 2021 01:42:06 +0000 (UTC) Received: from localhost ([::1]:48926 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1msxaP-0006qm-El for qemu-devel@archiver.kernel.org; Thu, 02 Dec 2021 20:42:05 -0500 Received: from eggs.gnu.org ([209.51.188.92]:53038) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1msxYc-0004ou-TV for qemu-devel@nongnu.org; Thu, 02 Dec 2021 20:40:14 -0500 Received: from prt-mail.chinatelecom.cn ([42.123.76.219]:59855 helo=chinatelecom.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1msxYZ-0006Bi-Ob for qemu-devel@nongnu.org; Thu, 02 Dec 2021 20:40:14 -0500 HMM_SOURCE_IP: 172.18.0.48:50602.1497958364 HMM_ATTACHE_NUM: 0000 HMM_SOURCE_TYPE: SMTP Received: from clientip-182.150.57.243 (unknown [172.18.0.48]) by chinatelecom.cn (HERMES) with SMTP id 424592800CA; Fri, 3 Dec 2021 09:40:03 +0800 (CST) X-189-SAVE-TO-SEND: +huangy81@chinatelecom.cn Received: from ([172.18.0.48]) by app0024 with ESMTP id 6f34526f658447749ed711090d0786e3 for qemu-devel@nongnu.org; Fri, 03 Dec 2021 09:40:10 CST X-Transaction-ID: 6f34526f658447749ed711090d0786e3 X-Real-From: huangy81@chinatelecom.cn X-Receive-IP: 172.18.0.48 X-MEDUSA-Status: 0 From: huangy81@chinatelecom.cn To: qemu-devel Subject: [PATCH v9 2/3] cpu-throttle: implement vCPU throttle Date: Fri, 3 Dec 2021 09:39:46 +0800 Message-Id: <155c8ef6b68c06829f971d356732783c671f661a.1638495274.git.huangy81@chinatelecom.cn> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=42.123.76.219; envelope-from=huangy81@chinatelecom.cn; helo=chinatelecom.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: David Hildenbrand , Hyman , Juan Quintela , Richard Henderson , Markus ArmBruster , Peter Xu , "Dr. David Alan Gilbert" , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Hyman Huang(黄勇) Impose dirty restraint on vCPU by kicking it and sleep as the auto-converge does during migration, but just kick the specified vCPU instead, not all vCPUs of vm. Start a thread to track the dirtylimit status and adjust the throttle pencentage dynamically depend on current and quota dirtyrate. Introduce the util function in the header for dirtylimit implementation. Signed-off-by: Hyman Huang(黄勇) --- include/sysemu/cpu-throttle.h | 45 ++++++ qapi/migration.json | 22 +++ softmmu/cpu-throttle.c | 355 ++++++++++++++++++++++++++++++++++++++++++ softmmu/trace-events | 5 + 4 files changed, 427 insertions(+) diff --git a/include/sysemu/cpu-throttle.h b/include/sysemu/cpu-throttle.h index d65bdef..962990b 100644 --- a/include/sysemu/cpu-throttle.h +++ b/include/sysemu/cpu-throttle.h @@ -65,4 +65,49 @@ bool cpu_throttle_active(void); */ int cpu_throttle_get_percentage(void); +/** + * dirtylimit_enabled + * + * Returns: %true if dirty page limit for vCPU is enabled, %false otherwise. + */ +bool dirtylimit_enabled(int cpu_index); + +/** + * dirtylimit_is_vcpu_index_valid + * + * Returns: %true if cpu index valid, %false otherwise. + */ +bool dirtylimit_is_vcpu_index_valid(int cpu_index); + +/** + * dirtylimit_state_init: + * + * initialize golobal state for dirtylimit + */ +void dirtylimit_state_init(int max_cpus); + +/** + * dirtylimit_vcpu: + * + * impose dirtylimit on vcpu util reaching the quota dirtyrate + */ +void dirtylimit_vcpu(int cpu_index, + uint64_t quota); + +/** + * dirtylimit_query_vcpu: + * + * Returns: dirty page limit information of specified virtual CPU. + */ +struct DirtyLimitInfo *dirtylimit_query_vcpu(int cpu_index); + +/** + * dirtylimit_cancel_vcpu: + * + * cancel dirtylimit for the specified vcpu + * + * Returns: the number of running threads for dirtylimit + */ +int dirtylimit_cancel_vcpu(int cpu_index); + #endif /* SYSEMU_CPU_THROTTLE_H */ diff --git a/qapi/migration.json b/qapi/migration.json index bbfd48c..3da8fdf 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1850,6 +1850,28 @@ { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' } ## +# @DirtyLimitInfo: +# +# Dirty page rate limit information of virtual CPU. +# +# @cpu-index: index of virtual CPU. +# +# @enable: true to enable, false to disable. +# +# @limit-rate: upper limit of dirty page rate for virtual CPU. +# +# @current-rate: current dirty page rate for virtual CPU. +# +# Since: 7.0 +# +## +{ 'struct': 'DirtyLimitInfo', + 'data': { 'cpu-index': 'int', + 'enable': 'bool', + 'limit-rate': 'int64', + 'current-rate': 'int64' } } + +## # @snapshot-save: # # Save a VM snapshot diff --git a/softmmu/cpu-throttle.c b/softmmu/cpu-throttle.c index 8c2144a..ca0f440 100644 --- a/softmmu/cpu-throttle.c +++ b/softmmu/cpu-throttle.c @@ -29,6 +29,9 @@ #include "qemu/main-loop.h" #include "sysemu/cpus.h" #include "sysemu/cpu-throttle.h" +#include "sysemu/dirtylimit.h" +#include "qapi/qapi-commands-migration.h" +#include "trace.h" /* vcpu throttling controls */ static QEMUTimer *throttle_timer; @@ -38,6 +41,358 @@ static unsigned int throttle_percentage; #define CPU_THROTTLE_PCT_MAX 99 #define CPU_THROTTLE_TIMESLICE_NS 10000000 +#define DIRTYLIMIT_TOLERANCE_RANGE 15 /* 15MB/s */ + +#define DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK 75 +#define DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK 90 + +#define DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE 5 +#define DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE 2 + +typedef enum { + RESTRAIN_KEEP, + RESTRAIN_RATIO, + RESTRAIN_HEAVY, + RESTRAIN_SLIGHT, +} RestrainPolicy; + +typedef struct DirtyLimitState { + int cpu_index; + bool enabled; + uint64_t quota; /* quota dirtyrate MB/s */ + QemuThread thread; + char *name; /* thread name */ +} DirtyLimitState; + +struct { + DirtyLimitState *states; + int max_cpus; + unsigned long *bmap; /* running thread bitmap */ + unsigned long nr; +} *dirtylimit_state; + +bool dirtylimit_enabled(int cpu_index) +{ + return qatomic_read(&dirtylimit_state->states[cpu_index].enabled); +} + +static bool dirtylimit_is_vcpu_unplug(int cpu_index) +{ + CPUState *cpu; + CPU_FOREACH(cpu) { + if (cpu->cpu_index == cpu_index) { + break; + } + } + + return cpu->unplug; +} + +bool dirtylimit_is_vcpu_index_valid(int cpu_index) +{ + if (cpu_index < 0 || + cpu_index >= qatomic_read(&dirtylimit_state->max_cpus) || + dirtylimit_is_vcpu_unplug(cpu_index)) { + return false; + } + + return true; +} + +static inline void dirtylimit_set_quota(int cpu_index, uint64_t quota) +{ + qatomic_set(&dirtylimit_state->states[cpu_index].quota, quota); +} + +static inline uint64_t dirtylimit_quota(int cpu_index) +{ + return qatomic_read(&dirtylimit_state->states[cpu_index].quota); +} + +static int64_t dirtylimit_current(int cpu_index) +{ + return dirtylimit_calc_current(cpu_index); +} + +static void dirtylimit_vcpu_thread(CPUState *cpu, run_on_cpu_data data) +{ + double pct; + double throttle_ratio; + int64_t sleeptime_ns, endtime_ns; + int *percentage = (int *)data.host_ptr; + + pct = (double)(*percentage) / 100; + throttle_ratio = pct / (1 - pct); + /* Add 1ns to fix double's rounding error (like 0.9999999...) */ + sleeptime_ns = (int64_t)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS + 1); + endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns; + while (sleeptime_ns > 0 && !cpu->stop) { + if (sleeptime_ns > SCALE_MS) { + qemu_cond_timedwait_iothread(cpu->halt_cond, + sleeptime_ns / SCALE_MS); + } else { + qemu_mutex_unlock_iothread(); + g_usleep(sleeptime_ns / SCALE_US); + qemu_mutex_lock_iothread(); + } + sleeptime_ns = endtime_ns - qemu_clock_get_ns(QEMU_CLOCK_REALTIME); + } + qatomic_set(&cpu->throttle_thread_scheduled, 0); + + free(percentage); +} + +static void dirtylimit_check(int cpu_index, + int percentage) +{ + CPUState *cpu; + int64_t sleeptime_ns, starttime_ms, currenttime_ms; + int *pct_parameter; + double pct; + + pct = (double) percentage / 100; + + starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + + while (true) { + CPU_FOREACH(cpu) { + if ((cpu_index == cpu->cpu_index) && + (!qatomic_xchg(&cpu->throttle_thread_scheduled, 1))) { + pct_parameter = malloc(sizeof(*pct_parameter)); + *pct_parameter = percentage; + async_run_on_cpu(cpu, dirtylimit_vcpu_thread, + RUN_ON_CPU_HOST_PTR(pct_parameter)); + break; + } + } + + sleeptime_ns = CPU_THROTTLE_TIMESLICE_NS / (1 - pct); + g_usleep(sleeptime_ns / SCALE_US); + + currenttime_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + if (unlikely((currenttime_ms - starttime_ms) > + (DIRTYLIMIT_CALC_PERIOD_TIME_S * 1000))) { + break; + } + } +} + +static uint64_t dirtylimit_init_pct(uint64_t quota, + uint64_t current) +{ + uint64_t limit_pct = 0; + + if (quota >= current || (current == 0) || + ((current - quota) <= DIRTYLIMIT_TOLERANCE_RANGE)) { + limit_pct = 0; + } else { + limit_pct = (current - quota) * 100 / current; + + limit_pct = MIN(limit_pct, + DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK); + } + + return limit_pct; +} + +static RestrainPolicy dirtylimit_policy(unsigned int last_pct, + uint64_t quota, + uint64_t current) +{ + uint64_t max, min; + + max = MAX(quota, current); + min = MIN(quota, current); + if ((max - min) <= DIRTYLIMIT_TOLERANCE_RANGE) { + return RESTRAIN_KEEP; + } + if (last_pct < DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK) { + /* last percentage locates in [0, 75)*/ + return RESTRAIN_RATIO; + } else if (last_pct < DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK) { + /* last percentage locates in [75, 90)*/ + return RESTRAIN_HEAVY; + } else { + /* last percentage locates in [90, 99]*/ + return RESTRAIN_SLIGHT; + } +} + +static uint64_t dirtylimit_pct(unsigned int last_pct, + uint64_t quota, + uint64_t current) +{ + uint64_t limit_pct = 0; + RestrainPolicy policy; + bool mitigate = (quota > current) ? true : false; + + if (mitigate && ((current == 0) || + (last_pct <= DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE))) { + return 0; + } + + policy = dirtylimit_policy(last_pct, quota, current); + switch (policy) { + case RESTRAIN_SLIGHT: + /* [90, 99] */ + if (mitigate) { + limit_pct = + last_pct - DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE; + } else { + limit_pct = + last_pct + DIRTYLIMIT_THROTTLE_SLIGHT_STEP_SIZE; + + limit_pct = MIN(limit_pct, CPU_THROTTLE_PCT_MAX); + } + break; + case RESTRAIN_HEAVY: + /* [75, 90) */ + if (mitigate) { + limit_pct = + last_pct - DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE; + } else { + limit_pct = + last_pct + DIRTYLIMIT_THROTTLE_HEAVY_STEP_SIZE; + + limit_pct = MIN(limit_pct, + DIRTYLIMIT_THROTTLE_SLIGHT_WATERMARK); + } + break; + case RESTRAIN_RATIO: + /* [0, 75) */ + if (mitigate) { + if (last_pct <= (((quota - current) * 100 / quota))) { + limit_pct = 0; + } else { + limit_pct = last_pct - + ((quota - current) * 100 / quota); + limit_pct = MAX(limit_pct, CPU_THROTTLE_PCT_MIN); + } + } else { + limit_pct = last_pct + + ((current - quota) * 100 / current); + + limit_pct = MIN(limit_pct, + DIRTYLIMIT_THROTTLE_HEAVY_WATERMARK); + } + break; + case RESTRAIN_KEEP: + default: + limit_pct = last_pct; + break; + } + + return limit_pct; +} + +static void *dirtylimit_thread(void *opaque) +{ + int cpu_index = *(int *)opaque; + uint64_t quota_dirtyrate, current_dirtyrate; + unsigned int last_pct = 0; + unsigned int pct = 0; + + rcu_register_thread(); + + quota_dirtyrate = dirtylimit_quota(cpu_index); + current_dirtyrate = dirtylimit_current(cpu_index); + + pct = dirtylimit_init_pct(quota_dirtyrate, current_dirtyrate); + + do { + trace_dirtylimit_impose(cpu_index, + quota_dirtyrate, current_dirtyrate, pct); + + last_pct = pct; + if (pct == 0) { + sleep(DIRTYLIMIT_CALC_PERIOD_TIME_S); + } else { + dirtylimit_check(cpu_index, pct); + } + + quota_dirtyrate = dirtylimit_quota(cpu_index); + current_dirtyrate = dirtylimit_current(cpu_index); + + pct = dirtylimit_pct(last_pct, quota_dirtyrate, current_dirtyrate); + } while (dirtylimit_enabled(cpu_index)); + + rcu_unregister_thread(); + + return NULL; +} + +int dirtylimit_cancel_vcpu(int cpu_index) +{ + int i; + int nr_threads = 0; + + qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 0); + dirtylimit_set_quota(cpu_index, 0); + + bitmap_test_and_clear_atomic(dirtylimit_state->bmap, cpu_index, 1); + + for (i = 0; i < dirtylimit_state->nr; i++) { + unsigned long temp = dirtylimit_state->bmap[i]; + nr_threads += ctpopl(temp); + } + + return nr_threads; +} + +void dirtylimit_vcpu(int cpu_index, + uint64_t quota) +{ + trace_dirtylimit_vcpu(cpu_index, quota); + + dirtylimit_set_quota(cpu_index, quota); + + if (unlikely(!dirtylimit_enabled(cpu_index))) { + qatomic_set(&dirtylimit_state->states[cpu_index].enabled, 1); + dirtylimit_state->states[cpu_index].name = + g_strdup_printf("dirtylimit-%d", cpu_index); + qemu_thread_create(&dirtylimit_state->states[cpu_index].thread, + dirtylimit_state->states[cpu_index].name, + dirtylimit_thread, + (void *)&dirtylimit_state->states[cpu_index].cpu_index, + QEMU_THREAD_DETACHED); + bitmap_set_atomic(dirtylimit_state->bmap, cpu_index, 1); + } +} + +struct DirtyLimitInfo *dirtylimit_query_vcpu(int cpu_index) +{ + DirtyLimitInfo *info = NULL; + + info = g_malloc0(sizeof(*info)); + info->cpu_index = cpu_index; + info->enable = dirtylimit_enabled(cpu_index); + info->limit_rate= dirtylimit_quota(cpu_index);; + info->current_rate = dirtylimit_current(cpu_index); + + return info; +} + +void dirtylimit_state_init(int max_cpus) +{ + int i; + + dirtylimit_state = g_malloc0(sizeof(*dirtylimit_state)); + + dirtylimit_state->states = + g_malloc0(sizeof(DirtyLimitState) * max_cpus); + + for (i = 0; i < max_cpus; i++) { + dirtylimit_state->states[i].cpu_index = i; + } + + dirtylimit_state->max_cpus = max_cpus; + dirtylimit_state->bmap = bitmap_new(max_cpus); + bitmap_clear(dirtylimit_state->bmap, 0, max_cpus); + dirtylimit_state->nr = BITS_TO_LONGS(max_cpus); + + trace_dirtylimit_state_init(max_cpus); +} + static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque) { double pct; diff --git a/softmmu/trace-events b/softmmu/trace-events index 9c88887..a7c9c04 100644 --- a/softmmu/trace-events +++ b/softmmu/trace-events @@ -31,3 +31,8 @@ runstate_set(int current_state, const char *current_state_str, int new_state, co system_wakeup_request(int reason) "reason=%d" qemu_system_shutdown_request(int reason) "reason=%d" qemu_system_powerdown_request(void) "" + +#cpu-throttle.c +dirtylimit_state_init(int max_cpus) "dirtylimit state init: max cpus %d" +dirtylimit_impose(int cpu_index, uint64_t quota, uint64_t current, int pct) "CPU[%d] impose dirtylimit: quota %" PRIu64 ", current %" PRIu64 ", percentage %d" +dirtylimit_vcpu(int cpu_index, uint64_t quota) "CPU[%d] set quota dirtylimit %"PRIu64 From patchwork Fri Dec 3 01:39:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Hyman Huang X-Patchwork-Id: 12654001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 908C1C433F5 for ; Fri, 3 Dec 2021 01:42:11 +0000 (UTC) Received: from localhost ([::1]:49374 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1msxaU-00078r-Mx for qemu-devel@archiver.kernel.org; Thu, 02 Dec 2021 20:42:10 -0500 Received: from eggs.gnu.org ([209.51.188.92]:53086) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1msxYh-0004yk-KM for qemu-devel@nongnu.org; Thu, 02 Dec 2021 20:40:19 -0500 Received: from prt-mail.chinatelecom.cn ([42.123.76.219]:59868 helo=chinatelecom.cn) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1msxYe-0006C2-G0 for qemu-devel@nongnu.org; Thu, 02 Dec 2021 20:40:19 -0500 HMM_SOURCE_IP: 172.18.0.48:50602.1497958364 HMM_ATTACHE_NUM: 0000 HMM_SOURCE_TYPE: SMTP Received: from clientip-182.150.57.243 (unknown [172.18.0.48]) by chinatelecom.cn (HERMES) with SMTP id 0AFD12800A5; Fri, 3 Dec 2021 09:40:06 +0800 (CST) X-189-SAVE-TO-SEND: +huangy81@chinatelecom.cn Received: from ([172.18.0.48]) by app0024 with ESMTP id c2d06b376cd14be4a3b02605cec3c5c8 for qemu-devel@nongnu.org; Fri, 03 Dec 2021 09:40:13 CST X-Transaction-ID: c2d06b376cd14be4a3b02605cec3c5c8 X-Real-From: huangy81@chinatelecom.cn X-Receive-IP: 172.18.0.48 X-MEDUSA-Status: 0 From: huangy81@chinatelecom.cn To: qemu-devel Subject: [PATCH v9 3/3] cpus-common: implement dirty page limit on vCPU Date: Fri, 3 Dec 2021 09:39:47 +0800 Message-Id: <9cc3cc5377e4330cbe0e87e89f452889516a4c09.1638495274.git.huangy81@chinatelecom.cn> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=42.123.76.219; envelope-from=huangy81@chinatelecom.cn; helo=chinatelecom.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: David Hildenbrand , Hyman , Juan Quintela , Richard Henderson , Markus ArmBruster , Peter Xu , "Dr. David Alan Gilbert" , Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Hyman Huang(黄勇) Implement dirtyrate calculation periodically basing on dirty-ring and throttle vCPU until it reachs the quota dirty page rate given by user. Introduce qmp commands "vcpu-dirty-limit", "query-vcpu-dirty-limit" to enable, disable, query dirty page limit for virtual CPU. Meanwhile, introduce corresponding hmp commands "vcpu_dirty_limit", "info vcpu_dirty_limit" so developers can play with them easier. Signed-off-by: Hyman Huang(黄勇) --- cpus-common.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++ hmp-commands-info.hx | 13 +++++ hmp-commands.hx | 16 ++++++ include/hw/core/cpu.h | 9 +++ include/monitor/hmp.h | 2 + qapi/migration.json | 48 ++++++++++++++++ softmmu/vl.c | 1 + 7 files changed, 238 insertions(+) diff --git a/cpus-common.c b/cpus-common.c index 6e73d3e..04b9bc9 100644 --- a/cpus-common.c +++ b/cpus-common.c @@ -23,6 +23,14 @@ #include "hw/core/cpu.h" #include "sysemu/cpus.h" #include "qemu/lockable.h" +#include "sysemu/dirtylimit.h" +#include "sysemu/cpu-throttle.h" +#include "sysemu/kvm.h" +#include "monitor/hmp.h" +#include "monitor/monitor.h" +#include "qapi/qmp/qdict.h" +#include "qapi/error.h" +#include "qapi/qapi-commands-migration.h" static QemuMutex qemu_cpu_list_lock; static QemuCond exclusive_cond; @@ -352,3 +360,144 @@ void process_queued_cpu_work(CPUState *cpu) qemu_mutex_unlock(&cpu->work_mutex); qemu_cond_broadcast(&qemu_work_cond); } + +void qmp_vcpu_dirty_limit(int64_t cpu_index, + bool enable, + uint64_t dirty_rate, + Error **errp) +{ + if (!kvm_enabled() || !kvm_dirty_ring_enabled()) { + error_setg(errp, "dirty page limit feature requires KVM with" + " accelerator property 'dirty-ring-size' set'"); + return; + } + + if (!dirtylimit_is_vcpu_index_valid(cpu_index)) { + error_setg(errp, "cpu index out of range"); + return; + } + + if (enable) { + dirtylimit_calc(); + dirtylimit_vcpu(cpu_index, dirty_rate); + } else { + if (!dirtylimit_enabled(cpu_index)) { + error_setg(errp, "dirty page limit for CPU %ld not set", + cpu_index); + return; + } + + if (!dirtylimit_cancel_vcpu(cpu_index)) { + dirtylimit_calc_quit(); + } + } +} + +void hmp_vcpu_dirty_limit(Monitor *mon, const QDict *qdict) +{ + int64_t cpu_index = qdict_get_try_int(qdict, "cpu_index", -1); + int64_t dirty_rate = qdict_get_try_int(qdict, "dirty_rate", -1); + bool enable = qdict_get_bool(qdict, "enable"); + Error *err = NULL; + + if (enable && dirty_rate < 0) { + monitor_printf(mon, "Enabling dirty limit requires dirty_rate set!\n"); + return; + } + + if (!dirtylimit_is_vcpu_index_valid(cpu_index)) { + monitor_printf(mon, "Incorrect cpu index specified!\n"); + return; + } + + qmp_vcpu_dirty_limit(cpu_index, enable, dirty_rate, &err); + + if (err) { + hmp_handle_error(mon, err); + return; + } + + monitor_printf(mon, "Enabling dirty page rate limit with %"PRIi64 + " MB/s\n", dirty_rate); + + monitor_printf(mon, "[Please use 'info vcpu_dirty_limit' to query " + "dirty limit for virtual CPU]\n"); +} + +struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(bool has_cpu_index, + int64_t cpu_index, + Error **errp) +{ + DirtyLimitInfo *info = NULL; + DirtyLimitInfoList *head = NULL, **tail = &head; + + if (has_cpu_index && + (!dirtylimit_is_vcpu_index_valid(cpu_index))) { + error_setg(errp, "cpu index out of range"); + return NULL; + } + + if (has_cpu_index) { + info = dirtylimit_query_vcpu(cpu_index); + QAPI_LIST_APPEND(tail, info); + } else { + CPUState *cpu; + CPU_FOREACH(cpu) { + if (!cpu->unplug) { + info = dirtylimit_query_vcpu(cpu->cpu_index); + QAPI_LIST_APPEND(tail, info); + } + } + } + + return head; +} + +void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict) +{ + DirtyLimitInfoList *limit, *head, *info = NULL; + int64_t cpu_index = qdict_get_try_int(qdict, "cpu_index", -1); + Error *err = NULL; + + if (cpu_index >=0 && + !dirtylimit_is_vcpu_index_valid(cpu_index)) { + monitor_printf(mon, "cpu index out of range\n"); + return; + } + + if (cpu_index < 0) { + info = qmp_query_vcpu_dirty_limit(false, -1, &err); + } else { + info = qmp_query_vcpu_dirty_limit(true, cpu_index, &err); + } + + if (err) { + hmp_handle_error(mon, err); + return; + } + + head = info; + for (limit = head; limit != NULL; limit = limit->next) { + monitor_printf(mon, "vcpu[%"PRIi64"], Enable: %s", + limit->value->cpu_index, + limit->value->enable ? "true" : "false"); + if (limit->value->enable) { + monitor_printf(mon, ", limit rate %"PRIi64 " (MB/s)," + " current rate %"PRIi64 " (MB/s)\n", + limit->value->limit_rate, + limit->value->current_rate); + } else { + monitor_printf(mon, "\n"); + } + } +} + +void dirtylimit_setup(int max_cpus) +{ + if (!kvm_enabled() || !kvm_dirty_ring_enabled()) { + return; + } + + dirtylimit_calc_state_init(max_cpus); + dirtylimit_state_init(max_cpus); +} diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx index 407a1da..aff28d9 100644 --- a/hmp-commands-info.hx +++ b/hmp-commands-info.hx @@ -863,6 +863,19 @@ SRST Display the vcpu dirty rate information. ERST + { + .name = "vcpu_dirty_limit", + .args_type = "cpu_index:l?", + .params = "cpu_index", + .help = "show dirty page limit information", + .cmd = hmp_info_vcpu_dirty_limit, + }, + +SRST + ``info vcpu_dirty_limit`` + Display the vcpu dirty page limit information. +ERST + #if defined(TARGET_I386) { .name = "sgx", diff --git a/hmp-commands.hx b/hmp-commands.hx index 70a9136..1839fa4 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1744,3 +1744,19 @@ ERST "\n\t\t\t -b to specify dirty bitmap as method of calculation)", .cmd = hmp_calc_dirty_rate, }, + +SRST +``vcpu_dirty_limit`` + Start dirty page rate limit on a virtual CPU, the information about all the + virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit`` + command. +ERST + + { + .name = "vcpu_dirty_limit", + .args_type = "cpu_index:l,enable:b,dirty_rate:l?", + .params = "cpu_index on|off [dirty_rate]", + .help = "enable, disable dirty page rate limit on a virtual CPU" + "(dirty_rate should be specified dirty_rate if enable)", + .cmd = hmp_vcpu_dirty_limit, + }, diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h index e948e81..11df012 100644 --- a/include/hw/core/cpu.h +++ b/include/hw/core/cpu.h @@ -881,6 +881,15 @@ void end_exclusive(void); */ void qemu_init_vcpu(CPUState *cpu); +/** + * dirtylimit_setup: + * + * Initializes the global state of dirtylimit calculation and + * dirtylimit itself. This is prepared for vCPU dirtylimit which + * could be triggered during vm lifecycle. + */ +void dirtylimit_setup(int max_cpus); + #define SSTEP_ENABLE 0x1 /* Enable simulated HW single stepping */ #define SSTEP_NOIRQ 0x2 /* Do not use IRQ while single stepping */ #define SSTEP_NOTIMER 0x4 /* Do not Timers while single stepping */ diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h index 96d0148..04879a2 100644 --- a/include/monitor/hmp.h +++ b/include/monitor/hmp.h @@ -131,6 +131,8 @@ void hmp_replay_delete_break(Monitor *mon, const QDict *qdict); void hmp_replay_seek(Monitor *mon, const QDict *qdict); void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict); void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict); +void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict); +void hmp_vcpu_dirty_limit(Monitor *mon, const QDict *qdict); void hmp_human_readable_text_helper(Monitor *mon, HumanReadableText *(*qmp_handler)(Error **)); diff --git a/qapi/migration.json b/qapi/migration.json index 3da8fdf..dc15b3f 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1872,6 +1872,54 @@ 'current-rate': 'int64' } } ## +# @vcpu-dirty-limit: +# +# Set or cancel the upper limit of dirty page rate for a virtual CPU. +# +# Requires KVM with accelerator property "dirty-ring-size" set. +# A virtual CPU's dirty page rate is a measure of its memory load. +# To observe dirty page rates, use @calc-dirty-rate. +# +# @cpu-index: index of virtual CPU. +# +# @enable: true to enable, false to disable. +# +# @dirty-rate: upper limit of dirty page rate for virtual CPU. +# +# Since: 7.0 +# +# Example: +# {"execute": "vcpu-dirty-limit"} +# "arguments": { "cpu-index": 0, +# "enable": true, +# "dirty-rate": 200 } } +# +## +{ 'command': 'vcpu-dirty-limit', + 'data': { 'cpu-index': 'int', + 'enable': 'bool', + 'dirty-rate': 'uint64'} } + +## +# @query-vcpu-dirty-limit: +# +# Returns information about the virtual CPU dirty limit status. +# +# @cpu-index: index of the virtual CPU to query, if not specified, all +# virtual CPUs will be queried. +# +# Since: 7.0 +# +# Example: +# {"execute": "query-vcpu-dirty-limit"} +# "arguments": { "cpu-index": 0 } } +# +## +{ 'command': 'query-vcpu-dirty-limit', + 'data': { '*cpu-index': 'int' }, + 'returns': [ 'DirtyLimitInfo' ] } + +## # @snapshot-save: # # Save a VM snapshot diff --git a/softmmu/vl.c b/softmmu/vl.c index 620a1f1..0f83ce3 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -3777,5 +3777,6 @@ void qemu_init(int argc, char **argv, char **envp) qemu_init_displays(); accel_setup_post(current_machine); os_setup_post(); + dirtylimit_setup(current_machine->smp.max_cpus); resume_mux_open(); }