From patchwork Mon Jul 18 08:41:24 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 9234201 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9058060756 for ; Mon, 18 Jul 2016 08:45:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7DD8320855 for ; Mon, 18 Jul 2016 08:45:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6F7F9228C8; Mon, 18 Jul 2016 08:45:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0367C20855 for ; Mon, 18 Jul 2016 08:45:16 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u6I8fb5N032481; Mon, 18 Jul 2016 04:41:37 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u6I8fY43008250 for ; Mon, 18 Jul 2016 04:41:34 -0400 Received: from mx1.redhat.com (ext-mx04.extmail.prod.ext.phx2.redhat.com [10.5.110.28]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u6I8fYmw025761 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 18 Jul 2016 04:41:34 -0400 Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com [74.125.82.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 476AF800AF; Mon, 18 Jul 2016 08:41:33 +0000 (UTC) Received: by mail-wm0-f68.google.com with SMTP id x83so11522818wma.3; Mon, 18 Jul 2016 01:41:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ECL7RC/nI/n1ryd+oXY7MobHZlllExoLkP6u1Z8Mins=; b=YtdTF6SQ3xLXkznp3/KuRrmpjVmKiKP00W3gEYO4flAwmXxr+VXW56IHLdr0PaY3/D WGqtNE2Zp9/Qqzk5zwNu2CLSdUy7aF4G9+mDkW2tgF1mDS8olCi55o2iNpLgGRJh7WUq ahRYj7PrRnjcY2iTB/soDaV8B6oCLDrZKHNc0SL+Ix/fqxoyu8QAHPrTzgU/XLFjYe+n L3O5fXwuprhg1vc0sJDd1LfEMqEuqBO3lTrArlDbKva0KKqHoij4mVeXFp0sg8Nqt58w 6CUqPe/eJL1RSI1FXgYdI0gcr0A8wud/eajvpVcEe/s7ASISrEnIBN7YOf53w/km3fyN iTNg== X-Gm-Message-State: ALyK8tJe5u9KhUGVlrdoxWobCUepyXWWp8ispPReJKUHWIiBItdguVXXFlAs6CGOJfW4dg== X-Received: by 10.28.226.85 with SMTP id z82mr44962426wmg.101.1468831291910; Mon, 18 Jul 2016 01:41:31 -0700 (PDT) Received: from tiehlicka.suse.cz ([80.188.202.66]) by smtp.gmail.com with ESMTPSA id g184sm15405426wme.15.2016.07.18.01.41.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Jul 2016 01:41:31 -0700 (PDT) From: Michal Hocko To: Date: Mon, 18 Jul 2016 10:41:24 +0200 Message-Id: <1468831285-27242-1-git-send-email-mhocko@kernel.org> In-Reply-To: <1468831164-26621-1-git-send-email-mhocko@kernel.org> References: <1468831164-26621-1-git-send-email-mhocko@kernel.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 18 Jul 2016 08:41:33 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 18 Jul 2016 08:41:33 +0000 (UTC) for IP:'74.125.82.68' DOMAIN:'mail-wm0-f68.google.com' HELO:'mail-wm0-f68.google.com' FROM:'mstsxfx@gmail.com' RCPT:'' X-RedHat-Spam-Score: -0.018 (BAYES_50, DCC_REPUT_13_19, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS) 74.125.82.68 mail-wm0-f68.google.com 74.125.82.68 mail-wm0-f68.google.com X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Scanned-By: MIMEDefang 2.75 on 10.5.110.28 X-loop: dm-devel@redhat.com Cc: Michal Hocko , Tetsuo Handa , LKML , dm-devel@redhat.com, Mikulas Patocka , Mel Gorman , David Rientjes , Ondrej Kozina , Andrew Morton Subject: [dm-devel] [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko There has been a report about OOM killer invoked when swapping out to a dm-crypt device. The primary reason seems to be that the swapout out IO managed to completely deplete memory reserves. Mikulas was able to bisect and explained the issue by pointing to f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"). The reason is that the swapout path is not throttled properly because the md-raid layer needs to allocate from the generic_make_request path which means it allocates from the PF_MEMALLOC context. dm layer uses mempool_alloc in order to guarantee a forward progress which used to inhibit access to memory reserves when using page allocator. This has changed by f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements") which has dropped the __GFP_NOMEMALLOC protection when the memory pool is depleted. If we are running out of memory and the only way forward to free memory is to perform swapout we just keep consuming memory reserves rather than throttling the mempool allocations and allowing the pending IO to complete up to a moment when the memory is depleted completely and there is no way forward but invoking the OOM killer. This is less than optimal. The original intention of f9054c70d28b was to help with the OOM situations where the oom victim depends on mempool allocation to make a forward progress. We can handle that case in a different way, though. We can check whether the current task has access to memory reserves ad an OOM victim (TIF_MEMDIE) and drop __GFP_NOMEMALLOC protection if the pool is empty. David Rientjes was objecting that such an approach wouldn't help if the oom victim was blocked on a lock held by process doing mempool_alloc. This is very similar to other oom deadlock situations and we have oom_reaper to deal with them so it is reasonable to rely on the same mechanism rather inventing a different one which has negative side effects. Fixes: f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements") Bisected-by: Mikulas Patocka Signed-off-by: Michal Hocko Reviewed-by: Mikulas Patocka Tested-by: Mikulas Patocka --- mm/mempool.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/mempool.c b/mm/mempool.c index 8f65464da5de..ea26d75c8adf 100644 --- a/mm/mempool.c +++ b/mm/mempool.c @@ -322,20 +322,20 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); + gfp_mask |= __GFP_NOMEMALLOC; /* don't allocate emergency reserves */ gfp_mask |= __GFP_NORETRY; /* don't loop in __alloc_pages */ gfp_mask |= __GFP_NOWARN; /* failures are OK */ gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO); repeat_alloc: - if (likely(pool->curr_nr)) { - /* - * Don't allocate from emergency reserves if there are - * elements available. This check is racy, but it will - * be rechecked each loop. - */ - gfp_temp |= __GFP_NOMEMALLOC; - } + /* + * Make sure that the OOM victim will get access to memory reserves + * properly if there are no objects in the pool to prevent from + * livelocks. + */ + if (!likely(pool->curr_nr) && test_thread_flag(TIF_MEMDIE)) + gfp_temp &= ~__GFP_NOMEMALLOC; element = pool->alloc(gfp_temp, pool->pool_data); if (likely(element != NULL)) @@ -359,7 +359,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) * We use gfp mask w/o direct reclaim or IO for the first round. If * alloc failed with that and @pool was empty, retry immediately. */ - if ((gfp_temp & ~__GFP_NOMEMALLOC) != gfp_mask) { + if ((gfp_temp & __GFP_DIRECT_RECLAIM) != (gfp_mask & __GFP_DIRECT_RECLAIM)) { spin_unlock_irqrestore(&pool->lock, flags); gfp_temp = gfp_mask; goto repeat_alloc;