From patchwork Thu Oct 25 08:24:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 10655595 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0CC5C13B5 for ; Thu, 25 Oct 2018 08:24:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE8AB2B22F for ; Thu, 25 Oct 2018 08:24:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E29842B237; Thu, 25 Oct 2018 08:24:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3CD8A2B22F for ; Thu, 25 Oct 2018 08:24:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 033516B026F; Thu, 25 Oct 2018 04:24:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F235E6B0270; Thu, 25 Oct 2018 04:24:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D789B6B0271; Thu, 25 Oct 2018 04:24:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id 7C0A06B026F for ; Thu, 25 Oct 2018 04:24:16 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id x44-v6so4273095edd.17 for ; Thu, 25 Oct 2018 01:24:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=LDnCjbwCU7GwgfYL0LS6gVq7nq2XeDqycf02/zMKIkE=; b=VE0IroxfK7BdI0SRCpBr7G2Y7yapJRryIOURvkTTDiRgvh7LOldRf6QnuE0YE276ZT J2xPqTyRpYDDdFJoWoYF179JadmnT2LClwQ424Cyf21NMaW1IgdGZwVDsjBpB1LDu1ws XWDkRQ9rl9A3FF+I9g9tbMK7UBq5A91LoNi8gUDYYpmFiZwqyKQGtO6Tvlg9gcNjbZAE n8k1mwLufhJ2Q2SltPcfd4nYGVIuwANI0SV2Gbo+thb5susdBXMZ1B2fo3GnCRcD9mcm ylA3ysHYpgo+PfXL7ZQmz7QdM8b3euFh+tVZg51GfvxffhtYpAVTX4MWCJqqlq98IVOQ Dvgw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Gm-Message-State: AGRZ1gJWQiyIrfw1I6aBxPXRUus8VKl0BbgrALEet6laK+Q8EgVo8dN+ DV6thcsCjLx2u+2Yilj1OeiWvgGA+8kQTDi8mIJFYjYOxA3ehA9+7tmFilYPqavRRxOhZKqnw+3 itt3fbDHpVdIcDuRWDM9EDL3KqGN7YYA0mDiXtozfWO1b5zoopgrxbI3XInWhr6cwidNqat9sg2 1NpU7fHknNPWa89yzGvbD3nmlPDT1XjRp0BxiEtID9AVHrEFn9DMNlfNgkaVckK519/CXLp5ZBQ Dj8mZrFGiLeFG0l4dGQYwfhYIi2V7g3UJScBUszvvoX0nPXQbeM7YfvvM/RcDt0zg2C32XVLDS+ OCkDphq48PSkq+u8Gq60dQMcjP4hxHjI8WEyMDsgV5C0GOWsJdj5jbp1pqAghmkORl7TWVnD1A= = X-Received: by 2002:a05:6402:128f:: with SMTP id w15mr781452edv.143.1540455855943; Thu, 25 Oct 2018 01:24:15 -0700 (PDT) X-Received: by 2002:a05:6402:128f:: with SMTP id w15mr781411edv.143.1540455854799; Thu, 25 Oct 2018 01:24:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540455854; cv=none; d=google.com; s=arc-20160816; b=SeNjeKD0uwlbZB8JN3JtDB6yVc2uaQbhr88Pc/SL0UzoRu/Zl0eS717WSHNlJ1JG76 yWDJquiEtX94J/J1h8+tUcjY/xxkdUG0/vfqU7s8xFykE4s0fmyB0MoThvUcO31iWOGZ 7vqnxzNgG6mCNG7VLpogzUBSjzT1xx3YSp+MHMOdneSiu5+V/zJyeIv5DgIPE3cFSCzF M0I3DUvtJq96zokLDmqJTIFd/zW60AQH1x9gid82lRzRtowpswJr7Pl+YjdwlBcpE9Va OMFm66OJ+rZh2U5UHHLSR6poTPqgucRA/bY7jsy7EYkrS+RvHrZiee466YvNBgjQWPKn 0g4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=LDnCjbwCU7GwgfYL0LS6gVq7nq2XeDqycf02/zMKIkE=; b=CqTOAP9mVC1XuvB9pH9QBVzDZNTaGEd4kr8sjEvVnJ3gJVJkBioHjjyK6fKIbmvtzn ZUS7/dKmdY9aaALVOu8qZc1zXHkZ9dhLS+ipXvO11ELejRJc62tctN/65HirthlWAbF0 yMVrDFixZGqzRtxaaNDXZuALA12hGliLkR5LrE+y79u7YSQYum0KLRGx7kJKJHLtcvNT RoWfkp8Y7Fa1M/IgOua3XUWR24BKI27z88i3BEQi7eR9S6mH4/VZS9PDIupI1Cw9kzca Z8IZZePLQLJXI6iwxAFC875OD03neX5Fzm8svH+snoJGCnC9W9vEDdrVMZXeXBc5jAqL Ok+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id b10-v6sor4997988edd.13.2018.10.25.01.24.14 for (Google Transport Security); Thu, 25 Oct 2018 01:24:14 -0700 (PDT) Received-SPF: pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mstsxfx@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=mstsxfx@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Google-Smtp-Source: AJdET5eWwbM5m/AslgFynlgkSJzNzpCqEg3ERg/iUpG6InNnqruK65CIebOemb7tvqk0eA+6j5pz1Q== X-Received: by 2002:a50:f00e:: with SMTP id r14-v6mr767992edl.58.1540455854130; Thu, 25 Oct 2018 01:24:14 -0700 (PDT) Received: from tiehlicka.suse.cz (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id m24-v6sm2628277edd.31.2018.10.25.01.24.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Oct 2018 01:24:13 -0700 (PDT) From: Michal Hocko To: Cc: Tetsuo Handa , Roman Gushchin , David Rientjes , Andrew Morton , LKML , Michal Hocko Subject: [RFC PATCH v2 3/3] mm, oom: hand over MMF_OOM_SKIP to exit path if it is guranteed to finish Date: Thu, 25 Oct 2018 10:24:03 +0200 Message-Id: <20181025082403.3806-4-mhocko@kernel.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181025082403.3806-1-mhocko@kernel.org> References: <20181025082403.3806-1-mhocko@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko David Rientjes has noted that certain user space memory allocators leave a lot of page tables behind and the current implementation of oom_reaper doesn't deal with those workloads very well. In order to improve these workloads define a point when exit_mmap is guaranteed to finish the tear down without any further blocking etc. This is right after we unlink vmas (those still depend on locks which are held while performing memory allocations from other contexts) and before we start releasing page tables. Opencode free_pgtables and explicitly unlink all vmas first. Then set mm->mmap to NULL (there shouldn't be anybody looking at it at this stage) and check for mm->mmap in the oom_reaper path. If the mm->mmap is NULL we rely on the exit path and won't set MMF_OOM_SKIP from the reaper. Changes since RFC - the task is still visible to the OOM killer after exit_mmap terminates so we should set MMF_OOM_SKIP from that path to be sure the oom killer doesn't get stuck on this task (see d7a94e7e11badf84 for more context) - per Tetsuo - split free_pgtables into unlinking and actual freeing part. We cannot rely on free_pgd_range because of hugetlb pages on ppc resp. sparc which do their own tear down Signed-off-by: Michal Hocko --- mm/internal.h | 3 +++ mm/memory.c | 28 ++++++++++++++++++---------- mm/mmap.c | 25 +++++++++++++++++++++---- mm/oom_kill.c | 13 +++++++------ 4 files changed, 49 insertions(+), 20 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 87256ae1bef8..35adbfec4935 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -40,6 +40,9 @@ void page_writeback_init(void); vm_fault_t do_swap_page(struct vm_fault *vmf); +void __unlink_vmas(struct vm_area_struct *vma); +void __free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, + unsigned long floor, unsigned long ceiling); void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); diff --git a/mm/memory.c b/mm/memory.c index c467102a5cbc..cf910ed5f283 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -612,20 +612,23 @@ void free_pgd_range(struct mmu_gather *tlb, } while (pgd++, addr = next, addr != end); } -void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, +void __unlink_vmas(struct vm_area_struct *vma) +{ + while (vma) { + unlink_anon_vmas(vma); + unlink_file_vma(vma); + vma = vma->vm_next; + } +} + +/* expects that __unlink_vmas has been called already */ +void __free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long floor, unsigned long ceiling) { while (vma) { struct vm_area_struct *next = vma->vm_next; unsigned long addr = vma->vm_start; - /* - * Hide vma from rmap and truncate_pagecache before freeing - * pgtables - */ - unlink_anon_vmas(vma); - unlink_file_vma(vma); - if (is_vm_hugetlb_page(vma)) { hugetlb_free_pgd_range(tlb, addr, vma->vm_end, floor, next ? next->vm_start : ceiling); @@ -637,8 +640,6 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, && !is_vm_hugetlb_page(next)) { vma = next; next = vma->vm_next; - unlink_anon_vmas(vma); - unlink_file_vma(vma); } free_pgd_range(tlb, addr, vma->vm_end, floor, next ? next->vm_start : ceiling); @@ -647,6 +648,13 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, } } +void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long floor, unsigned long ceiling) +{ + __unlink_vmas(vma); + __free_pgtables(tlb, vma, floor, ceiling); +} + int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { spinlock_t *ptl; diff --git a/mm/mmap.c b/mm/mmap.c index a02b314c0546..f4b562e21764 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3082,13 +3082,26 @@ void exit_mmap(struct mm_struct *mm) /* Use -1 here to ensure all VMAs in the mm are unmapped */ unmap_vmas(&tlb, vma, 0, -1); - /* oom_reaper cannot race with the page tables teardown */ + /* + * oom_reaper cannot race with the page tables teardown but we + * want to make sure that the exit path can take over the full + * tear down when it is safe to do so + */ if (oom) { down_write(&mm->mmap_sem); - set_bit(MMF_OOM_SKIP, &mm->flags); + __unlink_vmas(vma); + /* + * the exit path is guaranteed to finish the memory tear down + * without any unbound blocking at this stage so make it clear + * to the oom_reaper + */ + mm->mmap = NULL; + up_write(&mm->mmap_sem); + __free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); + } else { + free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); } - free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb, 0, -1); /* @@ -3102,8 +3115,12 @@ void exit_mmap(struct mm_struct *mm) } vm_unacct_memory(nr_accounted); + /* + * Now that the full address space is torn down, make sure the + * OOM killer skips over this task + */ if (oom) - up_write(&mm->mmap_sem); + set_bit(MMF_OOM_SKIP, &mm->flags); } /* Insert vm structure into process list sorted by address diff --git a/mm/oom_kill.c b/mm/oom_kill.c index ab42717661dc..db1ebb45c66a 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -570,12 +570,10 @@ static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm, unsi } /* - * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't - * work on the mm anymore. The check for MMF_OOM_SKIP must run - * under mmap_sem for reading because it serializes against the - * down_write() in exit_mmap(). + * If exit path clear mm->mmap then we know it will finish the tear down + * and we can go and bail out here. */ - if (test_bit(MMF_OOM_SKIP, &mm->flags)) { + if (!mm->mmap) { trace_skip_task_reaping(tsk->pid); goto out_unlock; } @@ -625,8 +623,11 @@ static void oom_reap_task(struct task_struct *tsk) /* * Hide this mm from OOM killer because it has been either reaped or * somebody can't call up_write(mmap_sem). + * Leave the MMF_OOM_SKIP to the exit path if it managed to reach the + * point it is guaranteed to finish without any blocking */ - set_bit(MMF_OOM_SKIP, &mm->flags); + if (mm->mmap) + set_bit(MMF_OOM_SKIP, &mm->flags); /* Drop a reference taken by wake_oom_reaper */ put_task_struct(tsk);