From patchwork Mon Jul 18 08:41:25 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 9234203 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5973760756 for ; Mon, 18 Jul 2016 08:46:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 48A3120855 for ; Mon, 18 Jul 2016 08:46:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3CE05228C8; Mon, 18 Jul 2016 08:46:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7A65920855 for ; Mon, 18 Jul 2016 08:46:18 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u6I8frlC032525; Mon, 18 Jul 2016 04:41:53 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u6I8fZhR008255 for ; Mon, 18 Jul 2016 04:41:35 -0400 Received: from mx1.redhat.com (ext-mx05.extmail.prod.ext.phx2.redhat.com [10.5.110.29]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u6I8fZV5000325 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 18 Jul 2016 04:41:35 -0400 Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com [74.125.82.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1BC5938102B; Mon, 18 Jul 2016 08:41:34 +0000 (UTC) Received: by mail-wm0-f67.google.com with SMTP id x83so11522866wma.3; Mon, 18 Jul 2016 01:41:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=7jPg/2QLYHeW3+WXURQQOoKfdDQWW1dhHXXXlnXmbH4=; b=C0cUtnZyeoGKDyStN1/RqNZkF6wtAXBiFmSVO8b2OjrPLb8bpVZdFj8/EdHWjzYHmZ 2pvgkn4jC1tipc+yzv/swsf4DwaaSG2d68s12ltJ5jfelXPkG6SBZn/3nLqv/Or3PaaQ kXx1mscwlP8QLUIDAEeVhWa0dmwbq5nGuhdiGhhCui3GfyuVG2DNbgSAx3wfUNX8kaHg KgoUzMubyq9wYDPDNMLsAgoTdJrgDqKaXXKYqSGL3U9nsqe1mymobpvSAxXLdrCpf/5H 6V4/PCilLRy3uZpjkpChrDFd3TuGOekMXNNyzUMnE7mPt53WM3f5+HXQFcw/ON1+mL20 f01Q== X-Gm-Message-State: ALyK8tIqPHnLbrderChZmLdHopTOxXWOvJ0sf7rR2fqJBnpmeua+XpFBlTCF7SZ/FjS6aA== X-Received: by 10.28.64.193 with SMTP id n184mr8326294wma.37.1468831292745; Mon, 18 Jul 2016 01:41:32 -0700 (PDT) Received: from tiehlicka.suse.cz ([80.188.202.66]) by smtp.gmail.com with ESMTPSA id g184sm15405426wme.15.2016.07.18.01.41.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Jul 2016 01:41:32 -0700 (PDT) From: Michal Hocko To: Date: Mon, 18 Jul 2016 10:41:25 +0200 Message-Id: <1468831285-27242-2-git-send-email-mhocko@kernel.org> In-Reply-To: <1468831285-27242-1-git-send-email-mhocko@kernel.org> References: <1468831164-26621-1-git-send-email-mhocko@kernel.org> <1468831285-27242-1-git-send-email-mhocko@kernel.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 18 Jul 2016 08:41:34 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 18 Jul 2016 08:41:34 +0000 (UTC) for IP:'74.125.82.67' DOMAIN:'mail-wm0-f67.google.com' HELO:'mail-wm0-f67.google.com' FROM:'mstsxfx@gmail.com' RCPT:'' X-RedHat-Spam-Score: -0.018 (BAYES_50, DCC_REPUT_13_19, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS) 74.125.82.67 mail-wm0-f67.google.com 74.125.82.67 mail-wm0-f67.google.com X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Scanned-By: MIMEDefang 2.78 on 10.5.110.29 X-loop: dm-devel@redhat.com Cc: Michal Hocko , Tetsuo Handa , LKML , dm-devel@redhat.com, Mikulas Patocka , Mel Gorman , David Rientjes , Ondrej Kozina , Andrew Morton Subject: [dm-devel] [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE tasks X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Virus-Scanned: ClamAV using ClamSMTP From: Michal Hocko Mikulas has reported that a swap backed by dm-crypt doesn't work properly because the swapout cannot make a sufficient forward progress as the writeout path depends on dm_crypt worker which has to allocate memory to perform the encryption. In order to guarantee a forward progress it relies on the mempool allocator. mempool_alloc(), however, prefers to use the underlying (usually page) allocator before it grabs objects from the pool. Such an allocation can dive into the memory reclaim and consequently to throttle_vm_writeout. If there are too many dirty or pages under writeback it will get throttled even though it is in fact a flusher to clear pending pages. [ 345.352536] kworker/u4:0 D ffff88003df7f438 10488 6 2 0x00000000 [ 345.352536] Workqueue: kcryptd kcryptd_crypt [dm_crypt] [ 345.352536] ffff88003df7f438 ffff88003e5d0380 ffff88003e5d0380 ffff88003e5d8e80 [ 345.352536] ffff88003dfb3240 ffff88003df73240 ffff88003df80000 ffff88003df7f470 [ 345.352536] ffff88003e5d0380 ffff88003e5d0380 ffff88003df7f828 ffff88003df7f450 [ 345.352536] Call Trace: [ 345.352536] [] schedule+0x3c/0x90 [ 345.352536] [] schedule_timeout+0x1d8/0x360 [ 345.352536] [] ? detach_if_pending+0x1c0/0x1c0 [ 345.352536] [] ? ktime_get+0xb3/0x150 [ 345.352536] [] ? __delayacct_blkio_start+0x1f/0x30 [ 345.352536] [] io_schedule_timeout+0xa4/0x110 [ 345.352536] [] congestion_wait+0x86/0x1f0 [ 345.352536] [] ? prepare_to_wait_event+0xf0/0xf0 [ 345.352536] [] throttle_vm_writeout+0x44/0xd0 [ 345.352536] [] shrink_zone_memcg+0x613/0x720 [ 345.352536] [] shrink_zone+0xe0/0x300 [ 345.352536] [] do_try_to_free_pages+0x1ad/0x450 [ 345.352536] [] try_to_free_pages+0xef/0x300 [ 345.352536] [] __alloc_pages_nodemask+0x879/0x1210 [ 345.352536] [] ? sched_clock_cpu+0x90/0xc0 [ 345.352536] [] alloc_pages_current+0xa1/0x1f0 [ 345.352536] [] ? new_slab+0x3f5/0x6a0 [ 345.352536] [] new_slab+0x2d7/0x6a0 [ 345.352536] [] ? sched_clock_local+0x17/0x80 [ 345.352536] [] ___slab_alloc+0x3fb/0x5c0 [ 345.352536] [] ? mempool_alloc_slab+0x1d/0x30 [ 345.352536] [] ? sched_clock_local+0x17/0x80 [ 345.352536] [] ? mempool_alloc_slab+0x1d/0x30 [ 345.352536] [] __slab_alloc+0x51/0x90 [ 345.352536] [] ? mempool_alloc_slab+0x1d/0x30 [ 345.352536] [] kmem_cache_alloc+0x27b/0x310 [ 345.352536] [] mempool_alloc_slab+0x1d/0x30 [ 345.352536] [] mempool_alloc+0x91/0x230 [ 345.352536] [] bio_alloc_bioset+0xbd/0x260 [ 345.352536] [] kcryptd_crypt+0x114/0x3b0 [dm_crypt] Memory pools are usually used for the writeback paths and it doesn't really make much sense to throttle them just because there are too many dirty/writeback pages. The main purpose of throttle_vm_writeout is to make sure that the pageout path doesn't generate too much dirty data. Considering that we are in mempool path which performs __GFP_NORETRY requests the risk shouldn't be really high. Fix this by ensuring that mempool users will get PF_LESS_THROTTLE and that such processes are not throttled in throttle_vm_writeout. They can still get throttled due to current_may_throttle() sleeps but that should happen when the backing device itself is congested which sounds like a proper reaction. Please note that the bonus given by domain_dirty_limits() alone is not sufficient because at least dm-crypt has to double buffer each page under writeback so this won't be sufficient to prevent from being throttled. There are other users of the flag but they are in the writeout path so this looks like a proper thing for them as well. Reported-by: Mikulas Patocka Signed-off-by: Michal Hocko Reviewed-by: Mikulas Patocka Tested-by: Mikulas Patocka --- mm/mempool.c | 19 +++++++++++++++---- mm/page-writeback.c | 3 +++ 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/mm/mempool.c b/mm/mempool.c index ea26d75c8adf..916e95c4192c 100644 --- a/mm/mempool.c +++ b/mm/mempool.c @@ -310,7 +310,8 @@ EXPORT_SYMBOL(mempool_resize); */ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) { - void *element; + unsigned int pflags = current->flags; + void *element = NULL; unsigned long flags; wait_queue_t wait; gfp_t gfp_temp; @@ -328,6 +329,12 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO); + /* + * Make sure that the allocation doesn't get throttled during the + * reclaim + */ + if (gfpflags_allow_blocking(gfp_mask)) + current->flags |= PF_LESS_THROTTLE; repeat_alloc: /* * Make sure that the OOM victim will get access to memory reserves @@ -339,7 +346,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) element = pool->alloc(gfp_temp, pool->pool_data); if (likely(element != NULL)) - return element; + goto out; spin_lock_irqsave(&pool->lock, flags); if (likely(pool->curr_nr)) { @@ -352,7 +359,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) * for debugging. */ kmemleak_update_trace(element); - return element; + goto out; } /* @@ -369,7 +376,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) /* We must not sleep if !__GFP_DIRECT_RECLAIM */ if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) { spin_unlock_irqrestore(&pool->lock, flags); - return NULL; + goto out; } /* Let's wait for someone else to return an element to @pool */ @@ -386,6 +393,10 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) finish_wait(&pool->wait, &wait); goto repeat_alloc; +out: + if (gfpflags_allow_blocking(gfp_mask)) + tsk_restore_flags(current, pflags, PF_LESS_THROTTLE); + return element; } EXPORT_SYMBOL(mempool_alloc); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 7fbb2d008078..a37661f1a11b 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1971,6 +1971,9 @@ void throttle_vm_writeout(gfp_t gfp_mask) unsigned long background_thresh; unsigned long dirty_thresh; + if (current->flags & PF_LESS_THROTTLE) + return; + for ( ; ; ) { global_dirty_limits(&background_thresh, &dirty_thresh); dirty_thresh = hard_dirty_limit(&global_wb_domain, dirty_thresh);