From patchwork Mon Sep 8 08:25:36 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 4860371 Return-Path: X-Original-To: patchwork-linux-arm-msm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 3C0829F32E for ; Mon, 8 Sep 2014 08:27:33 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id BC0C42012F for ; Mon, 8 Sep 2014 08:27:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 14D3A2011D for ; Mon, 8 Sep 2014 08:27:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752990AbaIHI12 (ORCPT ); Mon, 8 Sep 2014 04:27:28 -0400 Received: from mail-pd0-f180.google.com ([209.85.192.180]:39361 "EHLO mail-pd0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752329AbaIHI11 (ORCPT ); Mon, 8 Sep 2014 04:27:27 -0400 Received: by mail-pd0-f180.google.com with SMTP id ft15so4201098pdb.25 for ; Mon, 08 Sep 2014 01:27:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version:content-type; bh=x7lN+tDNrhTJiyw4GZHitjsuE39tu3j4ae7rSzljyuE=; b=kq+FdXkrejlXlM8t2maT3AGYxtucjiYXNIiwgxZXtNfb3UDB21s7Ej7wWBO6c878gb xAoloBKYZRJgtTL8StRCdFSoERjf/bDHKcdEUglOQamvd6dk5v1Wfm/euOl6euuZfoRW WeKoc1h1WaWO/wnOGhZABr5YO7tZpA+quZugdnp4vH7Ngxd/CIaR+kihgC2yPSpFzjkK b8tf/OaT1seCiRvY19NTMYgBQnro96DOO+Z1thQb0Qu+Pywnk8TjwIypaDdtd2/02pKK Hp5Abu8oUsF5zKYy3aD3gRPSwQki3v0iXvy6sZY89dCf2aCWB5Ulmv4qrHtOlH3M/CEA nkww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version:content-type; bh=x7lN+tDNrhTJiyw4GZHitjsuE39tu3j4ae7rSzljyuE=; b=iH+EDLkuAGsKBNqQNDepqBMeAXbaa8jckX+cLQQifW1lXBT0hhBVpMAMctPGI6gNLY sKrzc25mXee2OLuFu81lVMEnbelMJaUMpk6Fr9BG5J0Gyld9v7EBgM+5cub6WCzSMOZ4 GuAo0UDFg6QSBQz0aQ4qBMO/wjpCoE31CFdPDkdXDuw0VTJFMhdGApU3IC0LybQt79Gv TeqYCg2/JRQ890XE3G1SK86LKs8sLK0XBS0cFNEQ5FT17mdqUOvtdnNzQnQNMbKtwisb X0/ac+QCo3GRtosTuZkZ3UQ1yQgyCENtnWGiqHDsqxXpsb7xQSLLKU9WEyDcS9Er3ADa wAqw== X-Gm-Message-State: ALoCoQl5BsSMLl+Y9h92M0Fut9ITBnnRC4n3jkvlwTSEsUjgyEC6jfKLfNib57Gi4yDYzjEIP9up X-Received: by 10.66.176.97 with SMTP id ch1mr38968133pac.101.1410164846462; Mon, 08 Sep 2014 01:27:26 -0700 (PDT) Received: from [192.168.1.19] (c-67-169-183-19.hsd1.ca.comcast.net. [67.169.183.19]) by mx.google.com with ESMTPSA id lx10sm8340543pdb.31.2014.09.08.01.27.24 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 08 Sep 2014 01:27:25 -0700 (PDT) Date: Mon, 8 Sep 2014 01:25:36 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Peter Zijlstra cc: Hugh Dickins , Chintan Pandya , akpm@linux-foundation.org, linux-mm@kvack.org, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, Thomas Gleixner , John Stultz , Ingo Molnar Subject: Re: [PATCH v4 2/2] ksm: provide support to use deferrable timers for scanner thread In-Reply-To: <20140903095815.GK4783@worktop.ger.corp.intel.com> Message-ID: References: <1408536628-29379-1-git-send-email-cpandya@codeaurora.org> <1408536628-29379-2-git-send-email-cpandya@codeaurora.org> <20140903095815.GK4783@worktop.ger.corp.intel.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Sender: linux-arm-msm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, 3 Sep 2014, Peter Zijlstra wrote: > On Wed, Aug 27, 2014 at 11:02:20PM -0700, Hugh Dickins wrote: > > On Wed, 20 Aug 2014, Chintan Pandya wrote: > > > > > KSM thread to scan pages is scheduled on definite timeout. That wakes up > > > CPU from idle state and hence may affect the power consumption. Provide > > > an optional support to use deferrable timer which suites low-power > > > use-cases. > > > > > > Typically, on our setup we observed, 10% less power consumption with some > > > use-cases in which CPU goes to power collapse frequently. For example, > > > playing audio on Soc which has HW based Audio encoder/decoder, CPU > > > remains idle for longer duration of time. This idle state will save > > > significant CPU power consumption if KSM don't wakes them up > > > periodically. > > > > > > Note that, deferrable timers won't be deferred if any CPU is active and > > > not in IDLE state. > > > > > > By default, deferrable timers is enabled. To disable deferrable timers, > > > $ echo 0 > /sys/kernel/mm/ksm/deferrable_timer > > > > I have now experimented. And, much as I wanted to eliminate the > > tunable, and just have deferrable timers on, I have come right back > > to your original position. > > > > I was impressed by how quiet ksmd goes when there's nothing much > > happening on the machine; but equally, disappointed in how slow > > it then is to fulfil the outstanding merge work. I agree with your > > original assessment, that not everybody will want deferrable timer, > > the way it is working at present. > > > > I expect that can be fixed, partly by doing more work on wakeup from > > a deferred timer, according to how long it has been deferred; and > > partly by not deferring on idle until two passes of the list have been > > completed. But that's easier said than done, and might turn out to > > So why not have the timer cancel itself when there is no more work to do > and start itself up again when there's work added? Well, yes, but... how do we know when there is no more work to do? Further down I said: > > But fixing that might require ksm hooks in hot locations where nobody > > else would want them: I'm rather hoping we can strike a good enough > > balance with your deferrable timer, that nobody will need any better. Thomas has given reason why KSM might simply fail to do its job if we rely on the deferrable timer. So I've tried another approach, patch below; but I do not expect you to jump for joy at the sight of it! I've tried to minimize the offensive KSM hook in context_switch(). Why place it there, rather than do something near profile_tick() or account_process_tick()? Because KSM is aware of mms not tasks, and context_switch() should have the next mm cachelines hot (if not, a slight regrouping in mm_struct should do it); whereas I can find no reference whatever to mm_struct in kernel/time, so hooking to KSM from there would drag in another few cachelines every tick. (Another approach would be to set up KSM hint faulting, along the lines of NUMA hint faulting. Not a path I'm keen to go down.) I'm not thrilled with this patch, I think it's somewhat defective in several ways. But maybe in practice it will prove good enough, and if so then I'd rather not waste effort on complicating it. My own testing is not realistic, nor representative of real KSM users; and I have no idea what values of pages_to_scan and sleep_millisecs people really use (and those may make quite a difference to how well it works). Chintan, even if the scheduler guys turn out to hate it, please would you give the patch below a try, to see how well it works in your environment, whether it seems to go better or worse than your own patch. If it works well enough for you, maybe we can come up with ideas to make it more palatable. I do think your issue is an important one to fix, one way or another. Thanks, Hugh [PATCH] ksm: avoid periodic wakeup while mergeable mms are quiet Description yet to be written! Reported-by: Chintan Pandya Not-Signed-off-by: Hugh Dickins --- include/linux/ksm.h | 14 +++++++++++ include/linux/sched.h | 1 kernel/sched/core.c | 9 ++++++- mm/ksm.c | 50 ++++++++++++++++++++++++++++------------ 4 files changed, 58 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- 3.17-rc4/include/linux/ksm.h 2014-03-30 20:40:15.000000000 -0700 +++ linux/include/linux/ksm.h 2014-09-07 11:54:41.528003316 -0700 @@ -12,6 +12,7 @@ #include #include #include +#include struct stable_node; struct mem_cgroup; @@ -21,6 +22,7 @@ int ksm_madvise(struct vm_area_struct *v unsigned long end, int advice, unsigned long *vm_flags); int __ksm_enter(struct mm_struct *mm); void __ksm_exit(struct mm_struct *mm); +wait_queue_head_t *__ksm_switch(struct mm_struct *mm); static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) { @@ -35,6 +37,13 @@ static inline void ksm_exit(struct mm_st __ksm_exit(mm); } +static inline wait_queue_head_t *ksm_switch(struct mm_struct *mm) +{ + if (unlikely(test_bit(MMF_SWITCH_TO_KSM, &mm->flags))) + return __ksm_switch(mm); + return NULL; +} + /* * A KSM page is one of those write-protected "shared pages" or "merged pages" * which KSM maps into multiple mms, wherever identical anonymous page content @@ -87,6 +96,11 @@ static inline void ksm_exit(struct mm_st { } +static inline wait_queue_head_t *ksm_switch(struct mm_struct *mm) +{ + return NULL; +} + static inline int PageKsm(struct page *page) { return 0; --- 3.17-rc4/include/linux/sched.h 2014-08-16 16:00:53.909189060 -0700 +++ linux/include/linux/sched.h 2014-09-07 11:54:41.528003316 -0700 @@ -453,6 +453,7 @@ static inline int get_dumpable(struct mm #define MMF_HAS_UPROBES 19 /* has uprobes */ #define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong */ +#define MMF_SWITCH_TO_KSM 21 /* notify KSM of switch to this mm */ #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK) --- 3.17-rc4/kernel/sched/core.c 2014-08-16 16:00:54.062189063 -0700 +++ linux/kernel/sched/core.c 2014-09-07 11:54:41.528003316 -0700 @@ -61,6 +61,7 @@ #include #include #include +#include #include #include #include @@ -2304,6 +2305,7 @@ context_switch(struct rq *rq, struct tas struct task_struct *next) { struct mm_struct *mm, *oldmm; + wait_queue_head_t *wake_ksm = NULL; prepare_task_switch(rq, prev, next); @@ -2320,8 +2322,10 @@ context_switch(struct rq *rq, struct tas next->active_mm = oldmm; atomic_inc(&oldmm->mm_count); enter_lazy_tlb(oldmm, next); - } else + } else { switch_mm(oldmm, mm, next); + wake_ksm = ksm_switch(mm); + } if (!prev->mm) { prev->active_mm = NULL; @@ -2348,6 +2352,9 @@ context_switch(struct rq *rq, struct tas * frame will be invalid. */ finish_task_switch(this_rq(), prev); + + if (wake_ksm) + wake_up_interruptible(wake_ksm); } /* --- 3.17-rc4/mm/ksm.c 2014-08-16 16:00:54.132189065 -0700 +++ linux/mm/ksm.c 2014-09-07 11:54:41.528003316 -0700 @@ -205,6 +205,9 @@ static struct kmem_cache *rmap_item_cach static struct kmem_cache *stable_node_cache; static struct kmem_cache *mm_slot_cache; +/* The number of mergeable mms which have recently run */ +static atomic_t active_mergeable_mms = ATOMIC_INIT(0); + /* The number of nodes in the stable tree */ static unsigned long ksm_pages_shared; @@ -313,9 +316,13 @@ static inline struct mm_slot *alloc_mm_s return kmem_cache_zalloc(mm_slot_cache, GFP_KERNEL); } -static inline void free_mm_slot(struct mm_slot *mm_slot) +static void free_mm_slot(struct mm_struct *mm, struct mm_slot *mm_slot) { kmem_cache_free(mm_slot_cache, mm_slot); + + clear_bit(MMF_VM_MERGEABLE, &mm->flags); + if (!test_and_clear_bit(MMF_SWITCH_TO_KSM, &mm->flags)) + atomic_dec(&active_mergeable_mms); } static struct mm_slot *get_mm_slot(struct mm_struct *mm) @@ -801,8 +808,7 @@ static int unmerge_and_remove_all_rmap_i list_del(&mm_slot->mm_list); spin_unlock(&ksm_mmlist_lock); - free_mm_slot(mm_slot); - clear_bit(MMF_VM_MERGEABLE, &mm->flags); + free_mm_slot(mm, mm_slot); up_read(&mm->mmap_sem); mmdrop(mm); } else { @@ -1668,12 +1674,20 @@ next_mm: list_del(&slot->mm_list); spin_unlock(&ksm_mmlist_lock); - free_mm_slot(slot); - clear_bit(MMF_VM_MERGEABLE, &mm->flags); + free_mm_slot(mm, slot); up_read(&mm->mmap_sem); mmdrop(mm); } else { spin_unlock(&ksm_mmlist_lock); + /* + * After completing its scan, assume this mm to be inactive, + * but set a flag for context_switch() to notify us as soon + * as it is used again: see ksm_switch(). If the number of + * active_mergeable_mms goes down to zero, ksmd will sleep + * to save power, until awoken by mergeable context_switch(). + */ + if (!test_and_set_bit(MMF_SWITCH_TO_KSM, &mm->flags)) + atomic_dec(&active_mergeable_mms); up_read(&mm->mmap_sem); } @@ -1707,7 +1721,7 @@ static void ksm_do_scan(unsigned int sca static int ksmd_should_run(void) { - return (ksm_run & KSM_RUN_MERGE) && !list_empty(&ksm_mm_head.mm_list); + return (ksm_run & KSM_RUN_MERGE) && atomic_read(&active_mergeable_mms); } static int ksm_scan_thread(void *nothing) @@ -1785,15 +1799,11 @@ int ksm_madvise(struct vm_area_struct *v int __ksm_enter(struct mm_struct *mm) { struct mm_slot *mm_slot; - int needs_wakeup; mm_slot = alloc_mm_slot(); if (!mm_slot) return -ENOMEM; - /* Check ksm_run too? Would need tighter locking */ - needs_wakeup = list_empty(&ksm_mm_head.mm_list); - spin_lock(&ksm_mmlist_lock); insert_to_mm_slots_hash(mm, mm_slot); /* @@ -1812,10 +1822,9 @@ int __ksm_enter(struct mm_struct *mm) list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list); spin_unlock(&ksm_mmlist_lock); - set_bit(MMF_VM_MERGEABLE, &mm->flags); atomic_inc(&mm->mm_count); - - if (needs_wakeup) + set_bit(MMF_VM_MERGEABLE, &mm->flags); + if (atomic_inc_return(&active_mergeable_mms) == 1) wake_up_interruptible(&ksm_thread_wait); return 0; @@ -1850,8 +1859,7 @@ void __ksm_exit(struct mm_struct *mm) spin_unlock(&ksm_mmlist_lock); if (easy_to_free) { - free_mm_slot(mm_slot); - clear_bit(MMF_VM_MERGEABLE, &mm->flags); + free_mm_slot(mm, mm_slot); mmdrop(mm); } else if (mm_slot) { down_write(&mm->mmap_sem); @@ -1859,6 +1867,18 @@ void __ksm_exit(struct mm_struct *mm) } } +wait_queue_head_t *__ksm_switch(struct mm_struct *mm) +{ + /* + * Called by context_switch() to a hitherto inactive mergeable mm: + * scheduler locks forbid immediate wakeup so leave that to caller. + */ + if (test_and_clear_bit(MMF_SWITCH_TO_KSM, &mm->flags) && + atomic_inc_return(&active_mergeable_mms) == 1) + return &ksm_thread_wait; + return NULL; +} + struct page *ksm_might_need_to_copy(struct page *page, struct vm_area_struct *vma, unsigned long address) {