From patchwork Tue Jul 3 14:25:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 10504173 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 16CB16035E for ; Tue, 3 Jul 2018 14:26:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05A5328700 for ; Tue, 3 Jul 2018 14:26:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EC51A28949; Tue, 3 Jul 2018 14:26:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58ADC28700 for ; Tue, 3 Jul 2018 14:26:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A67C76B000D; Tue, 3 Jul 2018 10:26:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9C9166B000E; Tue, 3 Jul 2018 10:26:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 866F96B0010; Tue, 3 Jul 2018 10:26:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-oi0-f69.google.com (mail-oi0-f69.google.com [209.85.218.69]) by kanga.kvack.org (Postfix) with ESMTP id 5DF1B6B000E for ; Tue, 3 Jul 2018 10:26:16 -0400 (EDT) Received: by mail-oi0-f69.google.com with SMTP id m197-v6so1308459oig.18 for ; Tue, 03 Jul 2018 07:26:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=r+9oftx2deKDaoAPMIekQgsxMSZqChXDbXKF20lENLk=; b=j53fqKmZK9ex2uEOInF63yQrTYAG8qMVb3PGe8pV4hUgqA3eT0MuDnaxFJNQpi73Za rIK+jVyPkrUR9ot00YDNc+DTPNu7p9C6UeJS13qlFQODO8TfsKrnl0dX7vmOmZdQ0+mZ E+gaIerrDaboKHynmzkdj3lYvjxeWZhFdfmANSp8oh1GdXcWCoGZ4s6tqbUi8jIMCuB1 eKSe/UVE6Vvbdjc6fG2gTwaYUlZyFzPq/4MBD3Cw0ehM0QpMoLaCFe5LZChZDpQYKEzi fWdPmacDF1BkRaea9h7OTYaxwyP02BWhw/iWECoySb2tQsUSBXuY/xcoKRI8oD4zmSzU AoQA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp X-Gm-Message-State: APt69E1RFlLTN/HDtCMoHo22l7lRHssNQwZYcwqarPbQNx7zIqSwCulw DZod3ZCKdcOw1O7E6/ZHOzeL5QRd6uROMedxbP6HFhtMXvsbGhIG/zIS/VB58cGiLoBQXPGq95t NlHAby9XIcKDc4MoHVqEmHRwTMOiH3HkYM7kS/wzgERoyyRn/i1TB+14B6LnAXlLjPA== X-Received: by 2002:aca:dd07:: with SMTP id u7-v6mr1911269oig.177.1530627976135; Tue, 03 Jul 2018 07:26:16 -0700 (PDT) X-Google-Smtp-Source: AAOMgpcIFhsoQdoF/ZXTsrbvdpy4oKEWugwZqrEEbG5EQpr0G17QkzmK+A2ofvXK7aNpBQUylKfl X-Received: by 2002:aca:dd07:: with SMTP id u7-v6mr1911209oig.177.1530627975216; Tue, 03 Jul 2018 07:26:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530627975; cv=none; d=google.com; s=arc-20160816; b=TFwqrdfNU8x0YQ/UxzxWRgGFua2eQZjkTQ+mC4lJbjSj6tczXVgUGw7lMbmWotycQW DWocPrUSXshVRZlUWkvKYcHX9rWzb1rnOgIK7IgCtoyNc4UGFXYa8xWzKRTQi1d0b9kl rx7mHkP/wbBvIj54GDEJ0mK1JEuboDtNqtEHmnP2cfeuA6zylKgJl961lvhCyjzMhTbx xsthjR6VTqrKAS4nMWuU6LSEzr3Au1vVzCUdpjwHsvzQIUy1FUxRPcjXbqbMyJJuYaZV aKxe6R/GtOXgYsM1Zccc7pGMz5rP6gwuq5emArh+ZfpkTr/WJgggcv1lSRcrhhhKSP6N MCdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=r+9oftx2deKDaoAPMIekQgsxMSZqChXDbXKF20lENLk=; b=O9GpQq+FSgYVdN6Yi9wW+6lNTnyoQNSWDLs2zAzxtsM+PBtnCdSjsZ6f2pmk9bevBZ UEv3PYz5LaNZYgmoKjrQgCvEe+KLC6bGO/k+ygM/O1Jd+362P+iJNXMz+iRbri3gx1kl uWpNW7nO4K0wqeW8oO8Zp4qTOafxOfoxk3ec45j3SKSP5K2kWODtSF848kVomu2yxOgA CRjDpzivik4iRsOD3LU6X/cJRLJGLcZnzjjlSFRjqWvpEEgFjTEDgJBQFsTSY9oozoZ2 FX7zWRh7E+EOM4EULE0gmLbES3tYX3jPN1izd1YnWnOZfszNADGVp6BryXjrWtnE8HN/ k3Gg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [202.181.97.72]) by mx.google.com with ESMTPS id 62-v6si404050otp.176.2018.07.03.07.26.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 03 Jul 2018 07:26:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) client-ip=202.181.97.72; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of penguin-kernel@i-love.sakura.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp Received: from fsav402.sakura.ne.jp (fsav402.sakura.ne.jp [133.242.250.101]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id w63EPpV3063892; Tue, 3 Jul 2018 23:25:51 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav402.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav402.sakura.ne.jp); Tue, 03 Jul 2018 23:25:50 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav402.sakura.ne.jp) Received: from ccsecurity.localdomain (softbank126074194044.bbtec.net [126.74.194.44]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id w63EPbJ5063838 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 3 Jul 2018 23:25:50 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) From: Tetsuo Handa To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: torvalds@linux-foundation.org, Tetsuo Handa , David Rientjes , Johannes Weiner , Michal Hocko , Roman Gushchin , Tejun Heo , Vladimir Davydov Subject: [PATCH 1/8] mm, oom: Don't call schedule_timeout_killable() with oom_lock held. Date: Tue, 3 Jul 2018 23:25:02 +0900 Message-Id: <1530627910-3415-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1530627910-3415-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> References: <1530627910-3415-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When I was examining a bug which occurs under CPU + memory pressure, I observed that a thread which called out_of_memory() can sleep for minutes at schedule_timeout_killable(1) with oom_lock held when many threads are doing direct reclaim. The whole point of the sleep is to give the OOM victim some time to exit. But since commit 27ae357fa82be5ab ("mm, oom: fix concurrent munlock and oom reaper unmap, v3") changed the OOM victim to wait for oom_lock in order to close race window at exit_mmap(), the whole point of this sleep is lost now. We need to make sure that the thread which called out_of_memory() will release oom_lock shortly. Therefore, this patch brings the sleep to outside of the OOM path. Although the sleep will be after all removed by the last patch in this series, this patch is meant for ease of backport to stable kernels, for we are waiting for patches which can mitigate CVE-2016-10723. Signed-off-by: Tetsuo Handa Mitigates: CVE-2016-10723 Cc: Roman Gushchin Cc: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: David Rientjes Cc: Tejun Heo --- mm/oom_kill.c | 38 +++++++++++++++++--------------------- mm/page_alloc.c | 7 ++++++- 2 files changed, 23 insertions(+), 22 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 84081e7..d3fb4e4 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -479,6 +479,21 @@ bool process_shares_mm(struct task_struct *p, struct mm_struct *mm) static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock); +/* + * We have to make sure not to cause premature new oom victim selection. + * + * __alloc_pages_may_oom() oom_reap_task_mm()/exit_mmap() + * mutex_trylock(&oom_lock) + * get_page_from_freelist(ALLOC_WMARK_HIGH) # fails + * unmap_page_range() # frees some memory + * set_bit(MMF_OOM_SKIP) + * out_of_memory() + * select_bad_process() + * test_bit(MMF_OOM_SKIP) # selects new oom victim + * mutex_unlock(&oom_lock) + * + * Therefore, the callers hold oom_lock when calling this function. + */ void __oom_reap_task_mm(struct mm_struct *mm) { struct vm_area_struct *vma; @@ -523,20 +538,6 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) { bool ret = true; - /* - * We have to make sure to not race with the victim exit path - * and cause premature new oom victim selection: - * oom_reap_task_mm exit_mm - * mmget_not_zero - * mmput - * atomic_dec_and_test - * exit_oom_victim - * [...] - * out_of_memory - * select_bad_process - * # no TIF_MEMDIE task selects new victim - * unmap_page_range # frees some memory - */ mutex_lock(&oom_lock); if (!down_read_trylock(&mm->mmap_sem)) { @@ -1077,15 +1078,9 @@ bool out_of_memory(struct oom_control *oc) dump_header(oc, NULL); panic("Out of memory and no killable processes...\n"); } - if (oc->chosen && oc->chosen != (void *)-1UL) { + if (oc->chosen && oc->chosen != (void *)-1UL) oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : "Memory cgroup out of memory"); - /* - * Give the killed process a good chance to exit before trying - * to allocate memory again. - */ - schedule_timeout_killable(1); - } return !!oc->chosen; } @@ -1111,4 +1106,5 @@ void pagefault_out_of_memory(void) return; out_of_memory(&oc); mutex_unlock(&oom_lock); + schedule_timeout_killable(1); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100..6205d34 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3468,7 +3468,6 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) */ if (!mutex_trylock(&oom_lock)) { *did_some_progress = 1; - schedule_timeout_uninterruptible(1); return NULL; } @@ -4244,6 +4243,12 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) /* Retry as long as the OOM killer is making progress */ if (did_some_progress) { no_progress_loops = 0; + /* + * This schedule_timeout_*() serves as a guaranteed sleep for + * PF_WQ_WORKER threads when __zone_watermark_ok() == false. + */ + if (!tsk_is_oom_victim(current)) + schedule_timeout_uninterruptible(1); goto retry; }