From patchwork Sun Jan 3 07:12:47 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Al Viro X-Patchwork-Id: 7942971 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 3C2EFBEEE5 for ; Sun, 3 Jan 2016 07:12:55 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 3DA5620373 for ; Sun, 3 Jan 2016 07:12:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4D3CB20394 for ; Sun, 3 Jan 2016 07:12:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751101AbcACHMu (ORCPT ); Sun, 3 Jan 2016 02:12:50 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:41726 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750855AbcACHMu (ORCPT ); Sun, 3 Jan 2016 02:12:50 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.76 #1 (Red Hat Linux)) id 1aFcqN-0004Hp-OS; Sun, 03 Jan 2016 07:12:47 +0000 Date: Sun, 3 Jan 2016 07:12:47 +0000 From: Al Viro To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Dave Chinner , Ming Lei Subject: __vmalloc() vs. GFP_NOIO/GFP_NOFS Message-ID: <20160103071246.GK9938@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP While trying to write documentation on allocator choice, I've run into something odd: /* * __vmalloc() will allocate data pages and auxillary structures (e.g. * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context * here. Hence we need to tell memory reclaim that we are in such a * context via PF_MEMALLOC_NOIO to prevent memory reclaim re-entering * the filesystem here and potentially deadlocking. */ in XFS kmem_zalloc_large(). The comment is correct - __vmalloc() (actually, map_vm_area() called from __vmalloc_area_node()) ignores gfp_flags; prior to that point it does take care to pass __GFP_IO/__GFP_FS to page allocator, but once the data pages are allocated and we get around to inserting them into page tables those are ignored. Allocation page tables doesn't have gfp argument at all. Trying to propagate it down there could be done, but it's not attractive. Another approach is memalloc_noio_save(), actually used by XFS and some other __vmalloc() callers that might be getting GFP_NOIO or GFP_NOFS. That works, but not all such callers are using that mechanism. For example, drbd bm_realloc_pages() has GFP_NOIO __vmalloc() with no memalloc_noio_... in sight. Either that GFP_NOIO is not needed there (quite possible) or there's a deadlock in that code. The same goes for ipoib.c ipoib_cm_tx_init(); again, either that GFP_NOIO is not needed, or it can deadlock. Those, AFAICS, are such callers with GFP_NOIO; however, there's a shitload of GFP_NOFS ones. XFS uses memalloc_noio_save(), but a _lot_ of other callers do not. For example, all call chains leading to ceph_kvmalloc() pass GFP_NOFS and none of them is under memalloc_noio_save(). The same goes for GFS2 __vmalloc() callers, etc. Again, quite a few of those probably do not need GFP_NOFS at all, but those that do would appear to have hard-to-trigger deadlocks. Why do we do that in callers, though? I.e. why not do something like this: --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 8e3c9c5..412c5d6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1622,6 +1622,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, cond_resched(); } + if (unlikely(!(gfp_mask & __GFP_IO))) { + unsigned flags = memalloc_noio_save(); + if (map_vm_area(area, prot, pages)) { + memalloc_noio_restore(flags); + goto fail; + } + memalloc_noio_restore(flags); + return area->addr; + } + if (map_vm_area(area, prot, pages)) goto fail; return area->addr;