From patchwork Thu Apr 16 12:16:31 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 6225501 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 09E639F313 for ; Thu, 16 Apr 2015 12:16:50 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 3F01920270 for ; Thu, 16 Apr 2015 12:16:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D21332021B for ; Thu, 16 Apr 2015 12:16:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757348AbbDPMQl (ORCPT ); Thu, 16 Apr 2015 08:16:41 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50224 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755154AbbDPMQk (ORCPT ); Thu, 16 Apr 2015 08:16:40 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (Postfix) with ESMTPS id A6196AB129; Thu, 16 Apr 2015 12:16:39 +0000 (UTC) Received: from mguzik (ovpn-116-18.ams2.redhat.com [10.36.116.18]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t3GCGWYF019586 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA256 bits=256 verify=NO); Thu, 16 Apr 2015 08:16:36 -0400 Date: Thu, 16 Apr 2015 14:16:31 +0200 From: Mateusz Guzik To: Alexander Viro Cc: Andrew Morton , "Paul E. McKenney" , Yann Droneaud , Konstantin Khlebnikov , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install Message-ID: <20150416121628.GA20615@mguzik> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi, Currently obtaining a new file descriptor results in locking fdtable twice - once in order to reserve a slot and second time to fill it. Hack below gets rid of the second lock usage. It gives me a ~30% speedup (~300k ops -> ~400k ops) in a microbenchmark where 16 threads create a pipe (2 fds) and call 2 * close. Results are fluctuating and even with the patch sometimes drop to around ~300k ops. However, without the patch they never get higher. I can come up with a better benchmark if that's necessary. Comments? ============================================== Subject: [PATCH] fs: use a sequence counter instead of file_lock in fd_install Because the lock is not held, it is possible that fdtable will be reallocated as we fill it. RCU is used to guarantee the old table is not freed just in case we happen to write to it (which is harmless). sequence counter is used to ensure we updated the right table. Signed-off-by: Mateusz Guzik --- fs/file.c | 24 +++++++++++++++++++----- include/linux/fdtable.h | 5 +++++ 2 files changed, 24 insertions(+), 5 deletions(-) diff --git a/fs/file.c b/fs/file.c index 93c5f89..bd1ef4c 100644 --- a/fs/file.c +++ b/fs/file.c @@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct *files, int nr) cur_fdt = files_fdtable(files); if (nr >= cur_fdt->max_fds) { /* Continue as planned */ + write_seqcount_begin(&files->fdt_seqcount); copy_fdtable(new_fdt, cur_fdt); rcu_assign_pointer(files->fdt, new_fdt); + write_seqcount_end(&files->fdt_seqcount); if (cur_fdt != &files->fdtab) call_rcu(&cur_fdt->rcu, free_fdtable_rcu); } else { @@ -256,6 +258,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp) atomic_set(&newf->count, 1); spin_lock_init(&newf->file_lock); + seqcount_init(&newf->fdt_seqcount); newf->next_fd = 0; new_fdt = &newf->fdtab; new_fdt->max_fds = NR_OPEN_DEFAULT; @@ -429,6 +432,7 @@ void exit_files(struct task_struct *tsk) struct files_struct init_files = { .count = ATOMIC_INIT(1), + .fdt_seqcount = SEQCNT_ZERO(fdt_seqcount), .fdt = &init_files.fdtab, .fdtab = { .max_fds = NR_OPEN_DEFAULT, @@ -552,12 +556,22 @@ EXPORT_SYMBOL(put_unused_fd); void __fd_install(struct files_struct *files, unsigned int fd, struct file *file) { + unsigned long seq; struct fdtable *fdt; - spin_lock(&files->file_lock); - fdt = files_fdtable(files); - BUG_ON(fdt->fd[fd] != NULL); - rcu_assign_pointer(fdt->fd[fd], file); - spin_unlock(&files->file_lock); + + rcu_read_lock(); + do { + seq = read_seqcount_begin(&files->fdt_seqcount); + fdt = files_fdtable_seq(files); + /* + * Entry in the table can already be equal to file if we + * had to restart and copy_fdtable picked up our update. + */ + BUG_ON(!(fdt->fd[fd] == NULL || fdt->fd[fd] == file)); + rcu_assign_pointer(fdt->fd[fd], file); + smp_mb(); + } while (__read_seqcount_retry(&files->fdt_seqcount, seq)); + rcu_read_unlock(); } void fd_install(unsigned int fd, struct file *file) diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 230f87b..9e41765 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -12,6 +12,7 @@ #include #include #include +#include #include @@ -47,6 +48,7 @@ struct files_struct { * read mostly part */ atomic_t count; + seqcount_t fdt_seqcount; struct fdtable __rcu *fdt; struct fdtable fdtab; /* @@ -69,6 +71,9 @@ struct dentry; #define files_fdtable(files) \ rcu_dereference_check_fdtable((files), (files)->fdt) +#define files_fdtable_seq(files) \ + rcu_dereference((files)->fdt) + /* * The caller must ensure that fd table isn't shared or hold rcu or file lock */