From patchwork Thu Oct 25 08:24:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10655591 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 886B714DE for ; Thu, 25 Oct 2018 08:24:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 74EE32B236 for ; Thu, 25 Oct 2018 08:24:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 68CE52B252; Thu, 25 Oct 2018 08:24:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D69352B236 for ; Thu, 25 Oct 2018 08:24:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36CE36B026C; Thu, 25 Oct 2018 04:24:15 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2A08F6B026D; Thu, 25 Oct 2018 04:24:15 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 118876B026E; Thu, 25 Oct 2018 04:24:15 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id AD01F6B026C for ; Thu, 25 Oct 2018 04:24:14 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id i16-v6so4282595ede.11 for ; Thu, 25 Oct 2018 01:24:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=ki+kngmMO/xWHfrh7zqE78qHZf4pfpYk6sGkoYfsK/w=; b=pIHJLKd052I13rg6XKxE298QJ+sdzaXAEE3tFlwSPBu/A+1LK1JR7OGzIN5exhhQ6m vhoAx6c0d3vF7PWiv230momOs1fS4zPieB+WthNXiR0HJjuk2aKXTuVTHW8E723jYwyt tLp9erzI4MkwVEhn9KyTFWEy9R7u+ufZBtURG65zoIZnMVWs2J4KDJoP/VYNp/owVphd 8ovUyxJD5t5mMvWSOD1opU56hpWZI3OkFm/vIsAOXHJgupMSO8BB4UVO0n8eOb35xMU4 HpdLanBe6EP7WUJBo8i4TEgbCgVpI0jN5T9MffAr75tszpcqHLKBIxDwJI/96o7m8082 9K+g== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: AGRZ1gJG9EGaTXvLfRzlBMCSWGC/+Es0q9+mY/IbNWupA9X15J3KjhAC nFIftxO5WmH9opf3p4XOrX+dBGuEETUQgBC/fPNncKMtIeppa2aTb3w2bVef0AdHP4ww6ZrNSYm PsZb5VkMYiqPAA3DIg5EzWxkxjSA0fK38oMQhqgjVplwCO8SqBgSeGd00TcK8dqtVEbGH4KyF4u As0mF3bsL9N4CJ80mK4OlME++9/ID7MgHEYhIbwb0ZS+xx2Oo1/jhnj1XsTAlC/YLxZQQhBLbYg 9IeUwy8dq1dS4nd1pE7MDafhtEzB8p+oMmmOKc8sf70RYgy+WwwWGmJram3GrYkYnoRzev1qO61 U0AOWTBpRv5LJFmpIMCnkk2MGDIGG3WBFXKBNrmYnjJl1KVXpOlZFoYgqMtsg6C6aBXrFvErBg= = X-Received: by 2002:a17:906:1c45:: with SMTP id l5-v6mr620768ejg.118.1540455854119; Thu, 25 Oct 2018 01:24:14 -0700 (PDT) X-Received: by 2002:a17:906:1c45:: with SMTP id l5-v6mr620726ejg.118.1540455852944; Thu, 25 Oct 2018 01:24:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540455852; cv=none; d=google.com; s=arc-20160816; b=jlE3lL5x/JG2rBOfN4X+AizUM6XP41ZDXOLGH/fSFEWw1dgChzi3LvK0Us1TnKVCsz r2/ARXkm9HsFEsksa8prpu4VLg9g0b54z3cqQTnH8X9RcSwOwtq83axk70CHrMUpF1ux HViJ87bdAGHhYmM6myjmVr2WPXhiQY6ocyCdmWtWqZinn5KrEkVcg80oXg9Uz9FhbJxl ykyZlNEFQ0oh9sqLqMr0rMpI55JkEkDQA1yWngR/2zQliC5nLGVcLtz328qZJFaosIK4 6es0zVOJQtcKnbCtSJn9xtwbvPKCKujCsYSWP0cKFBuzpLNCmIFwX2q5WthXviim6lNn 8psA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=ki+kngmMO/xWHfrh7zqE78qHZf4pfpYk6sGkoYfsK/w=; b=uxkVYFyzcBteicdN9jnF0NE5CksLm+KHnBVVyNlEKbRi6YCLqIzovrHc4qYcpFFKRh p5sE+ELbbHZc3yH++C5/QQqrJegzhO9KLilM+UzLHUeyw+5SRCvgMsUeGz0eiWYV6yYm zOqUEgA2cbc3dbYNTV1phzsulJDQzkbtdZeTmgvlfGYGm2Z5H2bRVHei7Ta/icsNQweS ppCH+3N5yy0DXUIzUadxH97CQwqejZ7Eiz77N3tKF5DIM30HQn2X6/9+n5SE2ZnC4ZKw YvW3wxFH5Q/72IKDk2TqVoETk7TjXVt4rUSa/6qIF2jTNHkou30KCxgKUEcKqzkfQnIi /6hA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id o16-v6sor2423601ejc.45.2018.10.25.01.24.12 for (Google Transport Security); Thu, 25 Oct 2018 01:24:12 -0700 (PDT) Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Google-Smtp-Source: AJdET5cqZeAQKpBo+7lP/8xTao6kPlYSbvike08LeMNUvfJ54+Ggd+xhCdR4xl+/JbmWTdkS+VkM6A== X-Received: by 2002:a17:906:5044:: with SMTP id e4-v6mr606971ejk.3.1540455852208; Thu, 25 Oct 2018 01:24:12 -0700 (PDT) Received: from tiehlicka.suse.cz (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id m24-v6sm2628277edd.31.2018.10.25.01.24.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Oct 2018 01:24:11 -0700 (PDT) From: Michal Hocko To: Cc: Tetsuo Handa , Roman Gushchin , David Rientjes , Andrew Morton , LKML , Michal Hocko Subject: [RFC PATCH v2 1/3] mm, oom: rework mmap_exit vs. oom_reaper synchronization Date: Thu, 25 Oct 2018 10:24:01 +0200 Message-Id: <20181025082403.3806-2-mhocko@kernel.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181025082403.3806-1-mhocko@kernel.org> References: <20181025082403.3806-1-mhocko@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko The oom_reaper cannot handle mlocked vmas right now and therefore we have exit_mmap to reap the memory before it clears the mlock flags on mappings. This is all good but we would like to have a better hand over protocol between the oom_reaper and exit_mmap paths. Therefore use exclusive mmap_sem in exit_mmap whenever exit_mmap has to synchronize with the oom_reaper. There are two notable places. Mlocked vmas (munlock_vma_pages_all) and page tables tear down path. All others should be fine to race with oom_reap_task_mm. This is mostly a preparatory patch which shouldn't introduce functional changes. Changes since RFC - move MMF_OOM_SKIP in exit_mmap to before we are going to free page tables. Signed-off-by: Michal Hocko --- include/linux/oom.h | 2 -- mm/mmap.c | 50 ++++++++++++++++++++++----------------------- mm/oom_kill.c | 4 ++-- 3 files changed, 27 insertions(+), 29 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 69864a547663..11e26ca565a7 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -95,8 +95,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm) return 0; } -bool __oom_reap_task_mm(struct mm_struct *mm); - extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, const nodemask_t *nodemask, unsigned long totalpages); diff --git a/mm/mmap.c b/mm/mmap.c index 5f2b2b184c60..a02b314c0546 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3042,39 +3042,29 @@ void exit_mmap(struct mm_struct *mm) struct mmu_gather tlb; struct vm_area_struct *vma; unsigned long nr_accounted = 0; + bool oom = mm_is_oom_victim(mm); /* mm's last user has gone, and its about to be pulled down */ mmu_notifier_release(mm); - if (unlikely(mm_is_oom_victim(mm))) { - /* - * Manually reap the mm to free as much memory as possible. - * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard - * this mm from further consideration. Taking mm->mmap_sem for - * write after setting MMF_OOM_SKIP will guarantee that the oom - * reaper will not run on this mm again after mmap_sem is - * dropped. - * - * Nothing can be holding mm->mmap_sem here and the above call - * to mmu_notifier_release(mm) ensures mmu notifier callbacks in - * __oom_reap_task_mm() will not block. - * - * This needs to be done before calling munlock_vma_pages_all(), - * which clears VM_LOCKED, otherwise the oom reaper cannot - * reliably test it. - */ - (void)__oom_reap_task_mm(mm); - - set_bit(MMF_OOM_SKIP, &mm->flags); - down_write(&mm->mmap_sem); - up_write(&mm->mmap_sem); - } - if (mm->locked_vm) { vma = mm->mmap; while (vma) { - if (vma->vm_flags & VM_LOCKED) + if (vma->vm_flags & VM_LOCKED) { + /* + * oom_reaper cannot handle mlocked vmas but we + * need to serialize it with munlock_vma_pages_all + * which clears VM_LOCKED, otherwise the oom reaper + * cannot reliably test it. + */ + if (oom) + down_write(&mm->mmap_sem); + munlock_vma_pages_all(vma); + + if (oom) + up_write(&mm->mmap_sem); + } vma = vma->vm_next; } } @@ -3091,6 +3081,13 @@ void exit_mmap(struct mm_struct *mm) /* update_hiwater_rss(mm) here? but nobody should be looking */ /* Use -1 here to ensure all VMAs in the mm are unmapped */ unmap_vmas(&tlb, vma, 0, -1); + + /* oom_reaper cannot race with the page tables teardown */ + if (oom) { + down_write(&mm->mmap_sem); + set_bit(MMF_OOM_SKIP, &mm->flags); + } + free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb, 0, -1); @@ -3104,6 +3101,9 @@ void exit_mmap(struct mm_struct *mm) vma = remove_vma(vma); } vm_unacct_memory(nr_accounted); + + if (oom) + up_write(&mm->mmap_sem); } /* Insert vm structure into process list sorted by address diff --git a/mm/oom_kill.c b/mm/oom_kill.c index f10aa5360616..b3b2c2bbd8ab 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -488,7 +488,7 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait); static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock); -bool __oom_reap_task_mm(struct mm_struct *mm) +static bool __oom_reap_task_mm(struct mm_struct *mm) { struct vm_area_struct *vma; bool ret = true; @@ -554,7 +554,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't * work on the mm anymore. The check for MMF_OOM_SKIP must run * under mmap_sem for reading because it serializes against the - * down_write();up_write() cycle in exit_mmap(). + * down_write() in exit_mmap(). */ if (test_bit(MMF_OOM_SKIP, &mm->flags)) { trace_skip_task_reaping(tsk->pid); From patchwork Thu Oct 25 08:24:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10655593 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 23ADA14DE for ; Thu, 25 Oct 2018 08:24:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1194C2B22F for ; Thu, 25 Oct 2018 08:24:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 05EB52B237; Thu, 25 Oct 2018 08:24:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 72BFD2B22F for ; Thu, 25 Oct 2018 08:24:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F04D6B026D; Thu, 25 Oct 2018 04:24:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2A3B76B026F; Thu, 25 Oct 2018 04:24:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0CFCD6B0270; Thu, 25 Oct 2018 04:24:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id A16426B026D for ; Thu, 25 Oct 2018 04:24:15 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id x44-v6so4273084edd.17 for ; Thu, 25 Oct 2018 01:24:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=e/av0pB8zJCwcJxHsePcmzWL1U7Z3G4tDndbbOHv8Og=; b=LmshDs6KsG+Z8LJSg0XD64vnXLNqPZZXbDFYXj5JpZ2ILlrGm3mIjaQ4DtQ0vJNoVF mypF6B1vxwvRzZJKiejqlOdVKaxMJb6PxoaeDEYOmdpQZqu+Fq1yeiXLCV3QJsZ66zFW LcAYVNqCSmCAugHvgZZ/WC1aLqthuxFTtm/P/PcYiN9Oga80hTHKfJorWfYggh0QYqfV 8MlrF/TxXxCrf7eASh0qiEO1CMANx6Q7HRzVK3RlCMwfs88iueKHi/rIRwD2H0wcHTiT FJ5y0vwA1XKWG6IlneXRQM2TvY5wjy6fnJBLgeUHv0GxXnlGqItPcigJ4cUhDNqDKhSA AEgA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: AGRZ1gI4iGZIeaChDE3rjB+v9uHf6o4CXgf0UPaAg8SKdSVJsc4Gocju XUR1bfEb6tZuBcSL8Eia+9ZYrSC7jmduwZGu6NPDwkLkOf78RSzVLcMl6VtctFs8r8Sb2AchkcE DvPxVH1/HGfn0r9hJN3UuheQx6TU/iPXyOPl3Kccz8xmXJnmdSz3Zwxov48sT4J7brpctO9r0V1 UX5pz2CmpZ/8S09Uam/biFlewOLDz5gLJ1ZvAh1Ilj7nDrcPUWm9t1xDDpeHIaZynYrR3OYhSMA YOYyX1RAhlRMceOln62ZiOgX+PcnGkSN/FTVdrF0jAVPIvxkiEaxmVxscCDDbYv5MTT+aXY5lB0 i2fKwdQFYn0erMxoepRnj3G3QWu+M27hmaG7KsVjGTjOMKgnNvDbl5EljqrQl8WD0ZEJu/qa/A= = X-Received: by 2002:a17:906:c15:: with SMTP id s21-v6mr326535ejf.140.1540455855081; Thu, 25 Oct 2018 01:24:15 -0700 (PDT) X-Received: by 2002:a17:906:c15:: with SMTP id s21-v6mr326503ejf.140.1540455853935; Thu, 25 Oct 2018 01:24:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540455853; cv=none; d=google.com; s=arc-20160816; b=KTrV63xaJB8hWprfK4YqKlK/oZdotYN8DY0FnruPpQBGWVm3V9WD6DnkmQyZeDtABv HGeESHgj7SlYM6YnkEae+oIy+kUKsUWa/AKKfXQ1VtFhFxdmlFfckaN/6CF4QkjMzCwS DGrLQMI7Vv6WtyMVG9bQWujFAW49tq16BARYZHrVDxZRE2oayIFIgTDiVxOEreDURwJ4 q6aTDhyfbrc8JVc50ezDDhdJumZL4k7VhhVdvcpf0XH8Zdegxwa+MH91u/pTeyJrPWfZ Z6QxlPPDAaxPAwIo7N4J2CnIgm8qdXZ4o1wMjz94xM4Q0EVv5pgYj41zMXGpW9EpcCXh QT4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=e/av0pB8zJCwcJxHsePcmzWL1U7Z3G4tDndbbOHv8Og=; b=i6HINSnn23iIX0mSTzEZpVhsocryWLxudNEWWI1FX1O7FcQAZYWRtMIFTQDX6uVaKq W7znuBS+keDLn4gveC6dgcP6uheg9BqdYIGAZeDZYRDhyI4jQaD1RXQuc2iu5KeIjaBw GMPvGi4H0OXLaXMl8zzWhke0knwR4p35TmpnukK9rj/l8QcrVQz4D8IcfgBlp00Cwu1B sUPU/ldsLb8EyoBDH64khA2uyE9j4SuKsEJx74+oRhlhsf6X3BvYXIl50Z9REmnx/7up EV+Wtp1vd0st98WKnYt1kjPdEyb9UScqss3o08aliEM6NGaCnQi+bnNq1Brq77G0kF4Z Qvgg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id y52-v6sor5617975edb.19.2018.10.25.01.24.13 for (Google Transport Security); Thu, 25 Oct 2018 01:24:13 -0700 (PDT) Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Google-Smtp-Source: AJdET5e0QqkEcd4GBDzyoq1FkDzf2Fs/LbjX69f2r2oP9Z/TgnX0nNGmzCcvRPQgpQdJdzfGWCNp1A== X-Received: by 2002:aa7:d4c9:: with SMTP id t9-v6mr741268edr.256.1540455853240; Thu, 25 Oct 2018 01:24:13 -0700 (PDT) Received: from tiehlicka.suse.cz (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id m24-v6sm2628277edd.31.2018.10.25.01.24.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Oct 2018 01:24:12 -0700 (PDT) From: Michal Hocko To: Cc: Tetsuo Handa , Roman Gushchin , David Rientjes , Andrew Morton , LKML , Michal Hocko Subject: [RFC PATCH v2 2/3] mm, oom: keep retrying the oom_reap operation as long as there is substantial memory left Date: Thu, 25 Oct 2018 10:24:02 +0200 Message-Id: <20181025082403.3806-3-mhocko@kernel.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181025082403.3806-1-mhocko@kernel.org> References: <20181025082403.3806-1-mhocko@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko oom_reaper is not able to reap all types of memory. E.g. mlocked mappings or page tables. In some cases this might be a lot of memory and we do rely on exit_mmap to release that memory. Yet we cannot rely on exit_mmap to set MMF_OOM_SKIP right now because there are several places when sleeping locks are taken. This patch adds a simple heuristic to check for the amount of memory the mm is sitting on after oom_reaper is done with it. If this is still a considerable amount of the original memory (more than 1/4 of the original badness) then simply keep retrying oom_reap_task_mm. The portion is quite arbitrary and a subject of future tuning based on real life usecases but it should catch some outliars when the vast portion of the memory is consumed by non-reapable memory. Changes since RFC - do not use hardcoded threshold for retry and rather use a portion of the original badness instead Signed-off-by: Michal Hocko --- mm/oom_kill.c | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index b3b2c2bbd8ab..ab42717661dc 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -189,6 +189,16 @@ static bool is_dump_unreclaim_slabs(void) return (global_node_page_state(NR_SLAB_UNRECLAIMABLE) > nr_lru); } +/* + * Rough memory consumption of the given mm which should be theoretically freed + * when the mm is removed. + */ +static unsigned long oom_badness_pages(struct mm_struct *mm) +{ + return get_mm_rss(mm) + get_mm_counter(mm, MM_SWAPENTS) + + mm_pgtables_bytes(mm) / PAGE_SIZE; +} + /** * oom_badness - heuristic function to determine which candidate task to kill * @p: task struct of which task we should calculate @@ -230,8 +240,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, * The baseline for the badness score is the proportion of RAM that each * task's rss, pagetable and swap space use. */ - points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) + - mm_pgtables_bytes(p->mm) / PAGE_SIZE; + points = oom_badness_pages(p->mm); task_unlock(p); /* Normalize to oom_score_adj units */ @@ -488,7 +497,7 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait); static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock); -static bool __oom_reap_task_mm(struct mm_struct *mm) +static bool __oom_reap_task_mm(struct mm_struct *mm, unsigned long original_badness) { struct vm_area_struct *vma; bool ret = true; @@ -532,6 +541,16 @@ static bool __oom_reap_task_mm(struct mm_struct *mm) } } + /* + * If we still sit on a noticeable amount of memory even after successfully + * reaping the address space then keep retrying until exit_mmap makes some + * further progress. + * TODO: add a flag for a stage when the exit path doesn't block anymore + * and hand over MMF_OOM_SKIP handling there in that case + */ + if (oom_badness_pages(mm) > (original_badness >> 2)) + ret = false; + return ret; } @@ -541,7 +560,7 @@ static bool __oom_reap_task_mm(struct mm_struct *mm) * Returns true on success and false if none or part of the address space * has been reclaimed and the caller should retry later. */ -static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) +static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm, unsigned long original_badness) { bool ret = true; @@ -564,7 +583,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) trace_start_task_reaping(tsk->pid); /* failed to reap part of the address space. Try again later */ - ret = __oom_reap_task_mm(mm); + ret = __oom_reap_task_mm(mm, original_badness); if (!ret) goto out_finish; @@ -586,9 +605,10 @@ static void oom_reap_task(struct task_struct *tsk) { int attempts = 0; struct mm_struct *mm = tsk->signal->oom_mm; + unsigned long original_badness = oom_badness_pages(mm); /* Retry the down_read_trylock(mmap_sem) a few times */ - while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm)) + while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm, original_badness)) schedule_timeout_idle(HZ/10); if (attempts <= MAX_OOM_REAP_RETRIES || From patchwork Thu Oct 25 08:24:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10655595 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0CC5C13B5 for ; Thu, 25 Oct 2018 08:24:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE8AB2B22F for ; Thu, 25 Oct 2018 08:24:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E29842B237; Thu, 25 Oct 2018 08:24:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3CD8A2B22F for ; Thu, 25 Oct 2018 08:24:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 033516B026F; Thu, 25 Oct 2018 04:24:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F235E6B0270; Thu, 25 Oct 2018 04:24:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D789B6B0271; Thu, 25 Oct 2018 04:24:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id 7C0A06B026F for ; Thu, 25 Oct 2018 04:24:16 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id x44-v6so4273095edd.17 for ; Thu, 25 Oct 2018 01:24:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=LDnCjbwCU7GwgfYL0LS6gVq7nq2XeDqycf02/zMKIkE=; b=VE0IroxfK7BdI0SRCpBr7G2Y7yapJRryIOURvkTTDiRgvh7LOldRf6QnuE0YE276ZT J2xPqTyRpYDDdFJoWoYF179JadmnT2LClwQ424Cyf21NMaW1IgdGZwVDsjBpB1LDu1ws XWDkRQ9rl9A3FF+I9g9tbMK7UBq5A91LoNi8gUDYYpmFiZwqyKQGtO6Tvlg9gcNjbZAE n8k1mwLufhJ2Q2SltPcfd4nYGVIuwANI0SV2Gbo+thb5susdBXMZ1B2fo3GnCRcD9mcm ylA3ysHYpgo+PfXL7ZQmz7QdM8b3euFh+tVZg51GfvxffhtYpAVTX4MWCJqqlq98IVOQ Dvgw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: AGRZ1gJWQiyIrfw1I6aBxPXRUus8VKl0BbgrALEet6laK+Q8EgVo8dN+ DV6thcsCjLx2u+2Yilj1OeiWvgGA+8kQTDi8mIJFYjYOxA3ehA9+7tmFilYPqavRRxOhZKqnw+3 itt3fbDHpVdIcDuRWDM9EDL3KqGN7YYA0mDiXtozfWO1b5zoopgrxbI3XInWhr6cwidNqat9sg2 1NpU7fHknNPWa89yzGvbD3nmlPDT1XjRp0BxiEtID9AVHrEFn9DMNlfNgkaVckK519/CXLp5ZBQ Dj8mZrFGiLeFG0l4dGQYwfhYIi2V7g3UJScBUszvvoX0nPXQbeM7YfvvM/RcDt0zg2C32XVLDS+ OCkDphq48PSkq+u8Gq60dQMcjP4hxHjI8WEyMDsgV5C0GOWsJdj5jbp1pqAghmkORl7TWVnD1A= = X-Received: by 2002:a05:6402:128f:: with SMTP id w15mr781452edv.143.1540455855943; Thu, 25 Oct 2018 01:24:15 -0700 (PDT) X-Received: by 2002:a05:6402:128f:: with SMTP id w15mr781411edv.143.1540455854799; Thu, 25 Oct 2018 01:24:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540455854; cv=none; d=google.com; s=arc-20160816; b=SeNjeKD0uwlbZB8JN3JtDB6yVc2uaQbhr88Pc/SL0UzoRu/Zl0eS717WSHNlJ1JG76 yWDJquiEtX94J/J1h8+tUcjY/xxkdUG0/vfqU7s8xFykE4s0fmyB0MoThvUcO31iWOGZ 7vqnxzNgG6mCNG7VLpogzUBSjzT1xx3YSp+MHMOdneSiu5+V/zJyeIv5DgIPE3cFSCzF M0I3DUvtJq96zokLDmqJTIFd/zW60AQH1x9gid82lRzRtowpswJr7Pl+YjdwlBcpE9Va OMFm66OJ+rZh2U5UHHLSR6poTPqgucRA/bY7jsy7EYkrS+RvHrZiee466YvNBgjQWPKn 0g4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=LDnCjbwCU7GwgfYL0LS6gVq7nq2XeDqycf02/zMKIkE=; b=CqTOAP9mVC1XuvB9pH9QBVzDZNTaGEd4kr8sjEvVnJ3gJVJkBioHjjyK6fKIbmvtzn ZUS7/dKmdY9aaALVOu8qZc1zXHkZ9dhLS+ipXvO11ELejRJc62tctN/65HirthlWAbF0 yMVrDFixZGqzRtxaaNDXZuALA12hGliLkR5LrE+y79u7YSQYum0KLRGx7kJKJHLtcvNT RoWfkp8Y7Fa1M/IgOua3XUWR24BKI27z88i3BEQi7eR9S6mH4/VZS9PDIupI1Cw9kzca Z8IZZePLQLJXI6iwxAFC875OD03neX5Fzm8svH+snoJGCnC9W9vEDdrVMZXeXBc5jAqL Ok+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id b10-v6sor4997988edd.13.2018.10.25.01.24.14 for (Google Transport Security); Thu, 25 Oct 2018 01:24:14 -0700 (PDT) Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Google-Smtp-Source: AJdET5eWwbM5m/AslgFynlgkSJzNzpCqEg3ERg/iUpG6InNnqruK65CIebOemb7tvqk0eA+6j5pz1Q== X-Received: by 2002:a50:f00e:: with SMTP id r14-v6mr767992edl.58.1540455854130; Thu, 25 Oct 2018 01:24:14 -0700 (PDT) Received: from tiehlicka.suse.cz (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id m24-v6sm2628277edd.31.2018.10.25.01.24.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Oct 2018 01:24:13 -0700 (PDT) From: Michal Hocko To: Cc: Tetsuo Handa , Roman Gushchin , David Rientjes , Andrew Morton , LKML , Michal Hocko Subject: [RFC PATCH v2 3/3] mm, oom: hand over MMF_OOM_SKIP to exit path if it is guranteed to finish Date: Thu, 25 Oct 2018 10:24:03 +0200 Message-Id: <20181025082403.3806-4-mhocko@kernel.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181025082403.3806-1-mhocko@kernel.org> References: <20181025082403.3806-1-mhocko@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko David Rientjes has noted that certain user space memory allocators leave a lot of page tables behind and the current implementation of oom_reaper doesn't deal with those workloads very well. In order to improve these workloads define a point when exit_mmap is guaranteed to finish the tear down without any further blocking etc. This is right after we unlink vmas (those still depend on locks which are held while performing memory allocations from other contexts) and before we start releasing page tables. Opencode free_pgtables and explicitly unlink all vmas first. Then set mm->mmap to NULL (there shouldn't be anybody looking at it at this stage) and check for mm->mmap in the oom_reaper path. If the mm->mmap is NULL we rely on the exit path and won't set MMF_OOM_SKIP from the reaper. Changes since RFC - the task is still visible to the OOM killer after exit_mmap terminates so we should set MMF_OOM_SKIP from that path to be sure the oom killer doesn't get stuck on this task (see d7a94e7e11badf84 for more context) - per Tetsuo - split free_pgtables into unlinking and actual freeing part. We cannot rely on free_pgd_range because of hugetlb pages on ppc resp. sparc which do their own tear down Signed-off-by: Michal Hocko --- mm/internal.h | 3 +++ mm/memory.c | 28 ++++++++++++++++++---------- mm/mmap.c | 25 +++++++++++++++++++++---- mm/oom_kill.c | 13 +++++++------ 4 files changed, 49 insertions(+), 20 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 87256ae1bef8..35adbfec4935 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -40,6 +40,9 @@ void page_writeback_init(void); vm_fault_t do_swap_page(struct vm_fault *vmf); +void __unlink_vmas(struct vm_area_struct *vma); +void __free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, + unsigned long floor, unsigned long ceiling); void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); diff --git a/mm/memory.c b/mm/memory.c index c467102a5cbc..cf910ed5f283 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -612,20 +612,23 @@ void free_pgd_range(struct mmu_gather *tlb, } while (pgd++, addr = next, addr != end); } -void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, +void __unlink_vmas(struct vm_area_struct *vma) +{ + while (vma) { + unlink_anon_vmas(vma); + unlink_file_vma(vma); + vma = vma->vm_next; + } +} + +/* expects that __unlink_vmas has been called already */ +void __free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long floor, unsigned long ceiling) { while (vma) { struct vm_area_struct *next = vma->vm_next; unsigned long addr = vma->vm_start; - /* - * Hide vma from rmap and truncate_pagecache before freeing - * pgtables - */ - unlink_anon_vmas(vma); - unlink_file_vma(vma); - if (is_vm_hugetlb_page(vma)) { hugetlb_free_pgd_range(tlb, addr, vma->vm_end, floor, next ? next->vm_start : ceiling); @@ -637,8 +640,6 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, && !is_vm_hugetlb_page(next)) { vma = next; next = vma->vm_next; - unlink_anon_vmas(vma); - unlink_file_vma(vma); } free_pgd_range(tlb, addr, vma->vm_end, floor, next ? next->vm_start : ceiling); @@ -647,6 +648,13 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, } } +void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long floor, unsigned long ceiling) +{ + __unlink_vmas(vma); + __free_pgtables(tlb, vma, floor, ceiling); +} + int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { spinlock_t *ptl; diff --git a/mm/mmap.c b/mm/mmap.c index a02b314c0546..f4b562e21764 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3082,13 +3082,26 @@ void exit_mmap(struct mm_struct *mm) /* Use -1 here to ensure all VMAs in the mm are unmapped */ unmap_vmas(&tlb, vma, 0, -1); - /* oom_reaper cannot race with the page tables teardown */ + /* + * oom_reaper cannot race with the page tables teardown but we + * want to make sure that the exit path can take over the full + * tear down when it is safe to do so + */ if (oom) { down_write(&mm->mmap_sem); - set_bit(MMF_OOM_SKIP, &mm->flags); + __unlink_vmas(vma); + /* + * the exit path is guaranteed to finish the memory tear down + * without any unbound blocking at this stage so make it clear + * to the oom_reaper + */ + mm->mmap = NULL; + up_write(&mm->mmap_sem); + __free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); + } else { + free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); } - free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb, 0, -1); /* @@ -3102,8 +3115,12 @@ void exit_mmap(struct mm_struct *mm) } vm_unacct_memory(nr_accounted); + /* + * Now that the full address space is torn down, make sure the + * OOM killer skips over this task + */ if (oom) - up_write(&mm->mmap_sem); + set_bit(MMF_OOM_SKIP, &mm->flags); } /* Insert vm structure into process list sorted by address diff --git a/mm/oom_kill.c b/mm/oom_kill.c index ab42717661dc..db1ebb45c66a 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -570,12 +570,10 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm, unsi } /* - * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't - * work on the mm anymore. The check for MMF_OOM_SKIP must run - * under mmap_sem for reading because it serializes against the - * down_write() in exit_mmap(). + * If exit path clear mm->mmap then we know it will finish the tear down + * and we can go and bail out here. */ - if (test_bit(MMF_OOM_SKIP, &mm->flags)) { + if (!mm->mmap) { trace_skip_task_reaping(tsk->pid); goto out_unlock; } @@ -625,8 +623,11 @@ static void oom_reap_task(struct task_struct *tsk) /* * Hide this mm from OOM killer because it has been either reaped or * somebody can't call up_write(mmap_sem). + * Leave the MMF_OOM_SKIP to the exit path if it managed to reach the + * point it is guaranteed to finish without any blocking */ - set_bit(MMF_OOM_SKIP, &mm->flags); + if (mm->mmap) + set_bit(MMF_OOM_SKIP, &mm->flags); /* Drop a reference taken by wake_oom_reaper */ put_task_struct(tsk);