From patchwork Mon Apr 20 20:49:51 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 6244311 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 2BDB19F1C4 for ; Mon, 20 Apr 2015 20:50:16 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 43D902008F for ; Mon, 20 Apr 2015 20:50:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4E9B620034 for ; Mon, 20 Apr 2015 20:50:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754724AbbDTUt7 (ORCPT ); Mon, 20 Apr 2015 16:49:59 -0400 Received: from mail-ie0-f169.google.com ([209.85.223.169]:33653 "EHLO mail-ie0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754494AbbDTUty (ORCPT ); Mon, 20 Apr 2015 16:49:54 -0400 Received: by iecrt8 with SMTP id rt8so113594138iec.0; Mon, 20 Apr 2015 13:49:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:in-reply-to:references :content-type:mime-version:content-transfer-encoding; bh=zR4Y34g/KtFKsfcsh9A9InUj8V/VG5EiWWx4Zm2qOZU=; b=yDpjslBzeU5N4u2j8pvIDQBKr4OXi875higWZLv9GVL+0lAZnU67Jv+rxTijdRj9bt z9q6NcRavYegATayz1ftiVra/Qavvymltj4ZB1+m85Cih4Q48Jup9jMROSyBdtj9CPu7 iMR79AnyvxXwBcXfYi0kWEESyrmWjlqfgfYoAhTLFNY8zia3vTMHFpuwtcdVjZV6EXT7 oEZub1/g0SQ6LaeJc5jUxsIrqpY6RdjY078fNunK29WMxVgO8W4Swxzhp3rY/SeHjKMT nhycN/O00nkIJjqfiqQwiimF+386IST/uRiQCb4ybCPMJXMRy8yFE9R9SIaGrByjXSr0 ik4g== X-Received: by 10.107.137.143 with SMTP id t15mr24921903ioi.16.1429562993575; Mon, 20 Apr 2015 13:49:53 -0700 (PDT) Received: from [172.19.245.82] ([172.19.245.82]) by mx.google.com with ESMTPSA id p196sm9031413iop.15.2015.04.20.13.49.52 (version=TLSv1.2 cipher=AES128-GCM-SHA256 bits=128/128); Mon, 20 Apr 2015 13:49:53 -0700 (PDT) Message-ID: <1429562991.7346.290.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install From: Eric Dumazet To: Mateusz Guzik Cc: Al Viro , Andrew Morton , "Paul E. McKenney" , Yann Droneaud , Konstantin Khlebnikov , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Mon, 20 Apr 2015 13:49:51 -0700 In-Reply-To: <1429550126.7346.268.camel@edumazet-glaptop2.roam.corp.google.com> References: <20150416121628.GA20615@mguzik> <1429307216.7346.255.camel@edumazet-glaptop2.roam.corp.google.com> <20150417221646.GA15589@mguzik> <20150417230252.GE889@ZenIV.linux.org.uk> <20150420130633.GA2513@mguzik> <20150420134326.GC2513@mguzik> <20150420151054.GD2513@mguzik> <1429550126.7346.268.camel@edumazet-glaptop2.roam.corp.google.com> X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, T_DKIM_INVALID, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 2015-04-20 at 10:15 -0700, Eric Dumazet wrote: > On Mon, 2015-04-20 at 17:10 +0200, Mateusz Guzik wrote: > > > Sorry for spam but I came up with another hack. :) > > > > The idea is that we can have a variable which would signify the that > > given thread is playing with fd table in fd_install (kind of a lock > > embedded into task_struct). We would also have a flag in files struct > > indicating that a thread would like to resize it. > > > > expand_fdtable would set the flag and iterate over all threads waiting > > for all of them to have the var set to 0. > > The opposite : you have to block them in some way and add a rcu_sched() > or something. Here is the patch I cooked here but not yet tested. --- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/file.c b/fs/file.c index 93c5f89c248b..d72bdcacd4df 100644 --- a/fs/file.c +++ b/fs/file.c @@ -145,17 +145,23 @@ static int expand_fdtable(struct files_struct *files, int nr) { struct fdtable *new_fdt, *cur_fdt; + files->resize_in_progress++; spin_unlock(&files->file_lock); new_fdt = alloc_fdtable(nr); + /* make sure all __fd_install() have seen resize_in_progress */ + synchronize_sched(); spin_lock(&files->file_lock); - if (!new_fdt) + if (!new_fdt) { + files->resize_in_progress--; return -ENOMEM; + } /* * extremely unlikely race - sysctl_nr_open decreased between the check in * caller and alloc_fdtable(). Cheaper to catch it here... */ if (unlikely(new_fdt->max_fds <= nr)) { __free_fdtable(new_fdt); + files->resize_in_progress--; return -EMFILE; } /* @@ -173,6 +179,9 @@ static int expand_fdtable(struct files_struct *files, int nr) /* Somebody else expanded, so undo our attempt */ __free_fdtable(new_fdt); } + /* coupled with smp_rmb() in __fd_install() */ + smp_wmb(); + files->resize_in_progress--; return 1; } @@ -256,6 +265,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp) atomic_set(&newf->count, 1); spin_lock_init(&newf->file_lock); + newf->resize_in_progress = 0; newf->next_fd = 0; new_fdt = &newf->fdtab; new_fdt->max_fds = NR_OPEN_DEFAULT; @@ -553,11 +563,21 @@ void __fd_install(struct files_struct *files, unsigned int fd, struct file *file) { struct fdtable *fdt; - spin_lock(&files->file_lock); + + rcu_read_lock_sched(); + + while (unlikely(READ_ONCE(files->resize_in_progress))) { + rcu_read_unlock_sched(); + yield(); + rcu_read_lock_sched(); + } + /* coupled with smp_wmb() in expand_fdtable() */ + smp_rmb(); fdt = files_fdtable(files); BUG_ON(fdt->fd[fd] != NULL); rcu_assign_pointer(fdt->fd[fd], file); - spin_unlock(&files->file_lock); + + rcu_read_unlock_sched(); } void fd_install(unsigned int fd, struct file *file) diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 230f87bdf5ad..7496a0d73a7c 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -47,6 +47,7 @@ struct files_struct { * read mostly part */ atomic_t count; + int resize_in_progress; struct fdtable __rcu *fdt; struct fdtable fdtab; /*