From patchwork Fri Sep 23 07:24:34 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 9347561 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0C28F607D0 for ; Fri, 23 Sep 2016 07:25:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EDB6C28EB2 for ; Fri, 23 Sep 2016 07:25:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E08D129046; Fri, 23 Sep 2016 07:25:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54DE128EB2 for ; Fri, 23 Sep 2016 07:25:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964930AbcIWHYq (ORCPT ); Fri, 23 Sep 2016 03:24:46 -0400 Received: from mail-pa0-f68.google.com ([209.85.220.68]:33945 "EHLO mail-pa0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754378AbcIWHYp (ORCPT ); Fri, 23 Sep 2016 03:24:45 -0400 Received: by mail-pa0-f68.google.com with SMTP id i5so1531343pad.1; Fri, 23 Sep 2016 00:24:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=5c16E75FKjntTs/QwA7XSovQcGRFpTHPMmtyQI8Sal8=; b=Bkxnb4SlSpe9dMgCTwOhja1MTZuHACWE66jDJNM/Ri8xhdtwmaw0SDngs73iZX0H+l Bs99+VJUkwk7d5D2E94v3iDYWDe1+PMf/is0eVoygzj4UYS1/laKYXUoCl/8JNMPyn5+ ukZblcSMSswd/pd1Uy7Y4XKG76T3zv/zOcXPl0uEuH2KXz5xknUy7hqvmtKPpEVP9288 W9No8rsbkY/Klo4VKY9TBU2syWHVnusuh3C4mCilbGA5nnZn+PFtU1i3MvHmapKRA1Sz Hr0vQ/suCedONA7ufYhw92FyJVBx27wO0wmU+PsMQ5JEOD2J2Qq1B1uo60ndQUm5j/1s IkWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=5c16E75FKjntTs/QwA7XSovQcGRFpTHPMmtyQI8Sal8=; b=JHQTdJS91//CZ1y1x7lUa5VDvJqZikQ1M25XqGHssDTxV09zN6LmCn257joX+6VLwO LLsl6I1AILDNit193OuB0ZYceRkaejNNCBhzgSDUyPXCPfyjw/Acm/Jl3OIO6EUqA72l UnLcyEbYRgW8gWL9zAI7PEC7meaz5c4YOFY2RupI9em+hA9bLbEhlHYai+yQ1B5haYgr 2atQlmbrlPzpKnyrKMrl3q2XQezsn4/YrbxM6n8SV91HFOjb4WvxMJPeuiTIDRNJJBTN bQ2iNxhgIiVNYUG0rskc6D1WwQWGWMlB9l5FnRu6BEJ3h1srFrStKe9dfLybqycNiatQ TqWA== X-Gm-Message-State: AE9vXwP5JWN0AgFy0rmnduxgoZbPcjFnASwk7ekcesLaMJfBmr19d1Tv26uK3M556HcrYw== X-Received: by 10.66.162.165 with SMTP id yb5mr10188939pab.97.1474615484249; Fri, 23 Sep 2016 00:24:44 -0700 (PDT) Received: from roar.ozlabs.ibm.com ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id 27sm8883541pfr.29.2016.09.23.00.24.40 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 23 Sep 2016 00:24:43 -0700 (PDT) Date: Fri, 23 Sep 2016 17:24:34 +1000 From: Nicholas Piggin To: "Hillf Danton" Cc: "'Vlastimil Babka'" , "'Alexander Viro'" , , , , "'Michal Hocko'" , , Eric Dumazet Subject: Re: [PATCH] fs/select: add vmalloc fallback for select(2) Message-ID: <20160923172434.7ad8f2e0@roar.ozlabs.ibm.com> In-Reply-To: <006101d21565$b60a8a70$221f9f50$@alibaba-inc.com> References: <20160922152831.24165-1-vbabka@suse.cz> <006101d21565$b60a8a70$221f9f50$@alibaba-inc.com> Organization: IBM X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, 23 Sep 2016 14:42:53 +0800 "Hillf Danton" wrote: > > > > The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows > > with the number of fds passed. We had a customer report page allocation > > failures of order-4 for this allocation. This is a costly order, so it might > > easily fail, as the VM expects such allocation to have a lower-order fallback. > > > > Such trivial fallback is vmalloc(), as the memory doesn't have to be > > physically contiguous. Also the allocation is temporary for the duration of the > > syscall, so it's unlikely to stress vmalloc too much. > > > > Note that the poll(2) syscall seems to use a linked list of order-0 pages, so > > it doesn't need this kind of fallback. How about something like this? (untested) Eric isn't wrong about vmalloc sucking :) Thanks, Nick --- fs/select.c | 57 +++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 43 insertions(+), 14 deletions(-) diff --git a/fs/select.c b/fs/select.c index 8ed9da5..3b4834c 100644 --- a/fs/select.c +++ b/fs/select.c @@ -555,6 +555,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, void *bits; int ret, max_fds; unsigned int size; + size_t nr_bytes; struct fdtable *fdt; /* Allocate small arguments on the stack to save memory and be faster */ long stack_fds[SELECT_STACK_ALLOC/sizeof(long)]; @@ -576,21 +577,39 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, * since we used fdset we need to allocate memory in units of * long-words. */ - size = FDS_BYTES(n); + ret = -ENOMEM; bits = stack_fds; - if (size > sizeof(stack_fds) / 6) { - /* Not enough space in on-stack array; must use kmalloc */ + size = FDS_BYTES(n); + nr_bytes = 6 * size; + + if (unlikely(nr_bytes > PAGE_SIZE)) { + /* Avoid multi-page allocation if possible */ ret = -ENOMEM; - bits = kmalloc(6 * size, GFP_KERNEL); - if (!bits) - goto out_nofds; + fds.in = kmalloc(size, GFP_KERNEL); + fds.out = kmalloc(size, GFP_KERNEL); + fds.ex = kmalloc(size, GFP_KERNEL); + fds.res_in = kmalloc(size, GFP_KERNEL); + fds.res_out = kmalloc(size, GFP_KERNEL); + fds.res_ex = kmalloc(size, GFP_KERNEL); + + if (!(fds.in && fds.out && fds.ex && + fds.res_in && fds.res_out && fds.res_ex)) + goto out; + } else { + if (nr_bytes > sizeof(stack_fds)) { + /* Not enough space in on-stack array */ + if (nr_bytes > PAGE_SIZE * 2) + bits = kmalloc(nr_bytes, GFP_KERNEL); + if (!bits) + goto out_nofds; + } + fds.in = bits; + fds.out = bits + size; + fds.ex = bits + 2*size; + fds.res_in = bits + 3*size; + fds.res_out = bits + 4*size; + fds.res_ex = bits + 5*size; } - fds.in = bits; - fds.out = bits + size; - fds.ex = bits + 2*size; - fds.res_in = bits + 3*size; - fds.res_out = bits + 4*size; - fds.res_ex = bits + 5*size; if ((ret = get_fd_set(n, inp, fds.in)) || (ret = get_fd_set(n, outp, fds.out)) || @@ -617,8 +636,18 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, ret = -EFAULT; out: - if (bits != stack_fds) - kfree(bits); + if (unlikely(nr_bytes > PAGE_SIZE)) { + kfree(fds.in); + kfree(fds.out); + kfree(fds.ex); + kfree(fds.res_in); + kfree(fds.res_out); + kfree(fds.res_ex); + } else { + if (bits != stack_fds) + kfree(bits); + } + out_nofds: return ret; }