From patchwork Thu Aug 9 02:04:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10560693 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4B3414E5 for ; Thu, 9 Aug 2018 02:07:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A9B3A2AC88 for ; Thu, 9 Aug 2018 02:07:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9DC652AC8B; Thu, 9 Aug 2018 02:07:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C16DB2AC88 for ; Thu, 9 Aug 2018 02:07:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727560AbeHIE36 (ORCPT ); Thu, 9 Aug 2018 00:29:58 -0400 Received: from mx2.suse.de ([195.135.220.15]:60426 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725757AbeHIE35 (ORCPT ); Thu, 9 Aug 2018 00:29:57 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 641ECACE8; Thu, 9 Aug 2018 02:07:33 +0000 (UTC) From: NeilBrown To: Jeff Layton , Alexander Viro Date: Thu, 09 Aug 2018 12:04:41 +1000 Subject: [PATCH 1/5] fs/locks: rename some lists and pointers. Cc: "J. Bruce Fields" , Martin Wilck , linux-fsdevel@vger.kernel.org, Frank Filz , linux-kernel@vger.kernel.org Message-ID: <153378028105.1220.2668473677432721764.stgit@noble> In-Reply-To: <153378012255.1220.6754153662007899557.stgit@noble> References: <153378012255.1220.6754153662007899557.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP struct file lock contains an 'fl_next' pointer which is used to point to the lock that this request is blocked waiting for. So rename it to fl_blocker. The fl_blocked list_head in an active lock is the head of a list of blocked requests. In a request it is a node in that list. These are two distinct uses, so replace with two list_heads with different names. fl_blocked is the head of a list of blocked requests fl_block is a node on that list. This separation will be used in the next patch. Note that a tracepoint is changed to report fl_blocker instead of fl_next. Might this be an ABI breakage? Signed-off-by: NeilBrown --- fs/cifs/file.c | 2 +- fs/locks.c | 40 ++++++++++++++++++++------------------- include/linux/fs.h | 5 +++-- include/trace/events/filelock.h | 16 ++++++++-------- 4 files changed, 33 insertions(+), 30 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 8d41ca7bfcf1..066ed2e4ba96 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1092,7 +1092,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock) rc = posix_lock_file(file, flock, NULL); up_write(&cinode->lock_sem); if (rc == FILE_LOCK_DEFERRED) { - rc = wait_event_interruptible(flock->fl_wait, !flock->fl_next); + rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker); if (!rc) goto try_again; posix_unblock_lock(flock); diff --git a/fs/locks.c b/fs/locks.c index db7b6917d9c5..322491e70e41 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -194,7 +194,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS); * This lock protects the blocked_hash. Generally, if you're accessing it, you * want to be holding this lock. * - * In addition, it also protects the fl->fl_block list, and the fl->fl_next + * In addition, it also protects the fl->fl_blocked list, and the fl->fl_blocker * pointer for file_lock structures that are acting as lock requests (in * contrast to those that are acting as records of acquired locks). * @@ -203,7 +203,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS); * protected by this lock, we can skip acquiring it iff we already hold the * flc_lock. * - * In particular, adding an entry to the fl_block list requires that you hold + * In particular, adding an entry to the fl_blocked list requires that you hold * both the flc_lock and the blocked_lock_lock (acquired in that order). * Deleting an entry from the list however only requires the file_lock_lock. */ @@ -302,6 +302,7 @@ static void locks_init_lock_heads(struct file_lock *fl) { INIT_HLIST_NODE(&fl->fl_link); INIT_LIST_HEAD(&fl->fl_list); + INIT_LIST_HEAD(&fl->fl_blocked); INIT_LIST_HEAD(&fl->fl_block); init_waitqueue_head(&fl->fl_wait); } @@ -341,6 +342,7 @@ void locks_free_lock(struct file_lock *fl) { BUG_ON(waitqueue_active(&fl->fl_wait)); BUG_ON(!list_empty(&fl->fl_list)); + BUG_ON(!list_empty(&fl->fl_blocked)); BUG_ON(!list_empty(&fl->fl_block)); BUG_ON(!hlist_unhashed(&fl->fl_link)); @@ -676,7 +678,7 @@ static void __locks_delete_block(struct file_lock *waiter) { locks_delete_global_blocked(waiter); list_del_init(&waiter->fl_block); - waiter->fl_next = NULL; + waiter->fl_blocker = NULL; } static void locks_delete_block(struct file_lock *waiter) @@ -692,16 +694,16 @@ static void locks_delete_block(struct file_lock *waiter) * it seems like the reasonable thing to do. * * Must be called with both the flc_lock and blocked_lock_lock held. The - * fl_block list itself is protected by the blocked_lock_lock, but by ensuring + * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring * that the flc_lock is also held on insertions we can avoid taking the - * blocked_lock_lock in some cases when we see that the fl_block list is empty. + * blocked_lock_lock in some cases when we see that the fl_blocked list is empty. */ static void __locks_insert_block(struct file_lock *blocker, struct file_lock *waiter) { BUG_ON(!list_empty(&waiter->fl_block)); - waiter->fl_next = blocker; - list_add_tail(&waiter->fl_block, &blocker->fl_block); + waiter->fl_blocker = blocker; + list_add_tail(&waiter->fl_block, &blocker->fl_blocked); if (IS_POSIX(blocker) && !IS_OFDLCK(blocker)) locks_insert_global_blocked(waiter); } @@ -725,18 +727,18 @@ static void locks_wake_up_blocks(struct file_lock *blocker) /* * Avoid taking global lock if list is empty. This is safe since new * blocked requests are only added to the list under the flc_lock, and - * the flc_lock is always held here. Note that removal from the fl_block + * the flc_lock is always held here. Note that removal from the fl_blocked * list does not require the flc_lock, so we must recheck list_empty() * after acquiring the blocked_lock_lock. */ - if (list_empty(&blocker->fl_block)) + if (list_empty(&blocker->fl_blocked)) return; spin_lock(&blocked_lock_lock); - while (!list_empty(&blocker->fl_block)) { + while (!list_empty(&blocker->fl_blocked)) { struct file_lock *waiter; - waiter = list_first_entry(&blocker->fl_block, + waiter = list_first_entry(&blocker->fl_blocked, struct file_lock, fl_block); __locks_delete_block(waiter); if (waiter->fl_lmops && waiter->fl_lmops->lm_notify) @@ -887,7 +889,7 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl) hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) { if (posix_same_owner(fl, block_fl)) - return fl->fl_next; + return fl->fl_blocker; } return NULL; } @@ -1245,7 +1247,7 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl) error = posix_lock_inode(inode, fl, NULL); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); + error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); if (!error) continue; @@ -1332,7 +1334,7 @@ int locks_mandatory_area(struct inode *inode, struct file *filp, loff_t start, error = posix_lock_inode(inode, &fl, NULL); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl.fl_wait, !fl.fl_next); + error = wait_event_interruptible(fl.fl_wait, !fl.fl_blocker); if (!error) { /* * If we've been sleeping someone might have @@ -1526,7 +1528,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) locks_dispose_list(&dispose); error = wait_event_interruptible_timeout(new_fl->fl_wait, - !new_fl->fl_next, break_time); + !new_fl->fl_blocker, break_time); percpu_down_read_preempt_disable(&file_rwsem); spin_lock(&ctx->flc_lock); @@ -1940,7 +1942,7 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl) error = flock_lock_inode(inode, fl); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); + error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); if (!error) continue; @@ -2212,7 +2214,7 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd, error = vfs_lock_file(filp, cmd, fl, NULL); if (error != FILE_LOCK_DEFERRED) break; - error = wait_event_interruptible(fl->fl_wait, !fl->fl_next); + error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); if (!error) continue; @@ -2583,7 +2585,7 @@ posix_unblock_lock(struct file_lock *waiter) int status = 0; spin_lock(&blocked_lock_lock); - if (waiter->fl_next) + if (waiter->fl_blocker) __locks_delete_block(waiter); else status = -ENOENT; @@ -2711,7 +2713,7 @@ static int locks_show(struct seq_file *f, void *v) lock_get_status(f, fl, iter->li_pos, ""); - list_for_each_entry(bfl, &fl->fl_block, fl_block) + list_for_each_entry(bfl, &fl->fl_blocked, fl_block) lock_get_status(f, bfl, iter->li_pos, " ->"); return 0; diff --git a/include/linux/fs.h b/include/linux/fs.h index 805bf22898cf..48f7f36c3772 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1002,10 +1002,11 @@ bool opens_in_grace(struct net *); * Obviously, the last two criteria only matter for POSIX locks. */ struct file_lock { - struct file_lock *fl_next; /* singly linked list for this inode */ + struct file_lock *fl_blocker; /* The lock, that is blocking us */ struct list_head fl_list; /* link into file_lock_context */ struct hlist_node fl_link; /* node in global lists */ - struct list_head fl_block; /* circular list of blocked processes */ + struct list_head fl_blocked; /* list of requests with ->fl_blocker pointing here */ + struct list_head fl_block; /* node in ->fl_blocker->fl_blocked */ fl_owner_t fl_owner; unsigned int fl_flags; unsigned char fl_type; diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h index d1faf3597b9d..0e50f985a390 100644 --- a/include/trace/events/filelock.h +++ b/include/trace/events/filelock.h @@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(filelock_lock, __field(struct file_lock *, fl) __field(unsigned long, i_ino) __field(dev_t, s_dev) - __field(struct file_lock *, fl_next) + __field(struct file_lock *, fl_blocker) __field(fl_owner_t, fl_owner) __field(unsigned int, fl_pid) __field(unsigned int, fl_flags) @@ -82,7 +82,7 @@ DECLARE_EVENT_CLASS(filelock_lock, __entry->fl = fl ? fl : NULL; __entry->s_dev = inode->i_sb->s_dev; __entry->i_ino = inode->i_ino; - __entry->fl_next = fl ? fl->fl_next : NULL; + __entry->fl_blocker = fl ? fl->fl_blocker : NULL; __entry->fl_owner = fl ? fl->fl_owner : NULL; __entry->fl_pid = fl ? fl->fl_pid : 0; __entry->fl_flags = fl ? fl->fl_flags : 0; @@ -92,9 +92,9 @@ DECLARE_EVENT_CLASS(filelock_lock, __entry->ret = ret; ), - TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_next=0x%p fl_owner=0x%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d", + TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_blocker=0x%p fl_owner=0x%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d", __entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev), - __entry->i_ino, __entry->fl_next, __entry->fl_owner, + __entry->i_ino, __entry->fl_blocker, __entry->fl_owner, __entry->fl_pid, show_fl_flags(__entry->fl_flags), show_fl_type(__entry->fl_type), __entry->fl_start, __entry->fl_end, __entry->ret) @@ -122,7 +122,7 @@ DECLARE_EVENT_CLASS(filelock_lease, __field(struct file_lock *, fl) __field(unsigned long, i_ino) __field(dev_t, s_dev) - __field(struct file_lock *, fl_next) + __field(struct file_lock *, fl_blocker) __field(fl_owner_t, fl_owner) __field(unsigned int, fl_flags) __field(unsigned char, fl_type) @@ -134,7 +134,7 @@ DECLARE_EVENT_CLASS(filelock_lease, __entry->fl = fl ? fl : NULL; __entry->s_dev = inode->i_sb->s_dev; __entry->i_ino = inode->i_ino; - __entry->fl_next = fl ? fl->fl_next : NULL; + __entry->fl_blocker = fl ? fl->fl_blocker : NULL; __entry->fl_owner = fl ? fl->fl_owner : NULL; __entry->fl_flags = fl ? fl->fl_flags : 0; __entry->fl_type = fl ? fl->fl_type : 0; @@ -142,9 +142,9 @@ DECLARE_EVENT_CLASS(filelock_lease, __entry->fl_downgrade_time = fl ? fl->fl_downgrade_time : 0; ), - TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_next=0x%p fl_owner=0x%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu", + TP_printk("fl=0x%p dev=0x%x:0x%x ino=0x%lx fl_blocker=0x%p fl_owner=0x%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu", __entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev), - __entry->i_ino, __entry->fl_next, __entry->fl_owner, + __entry->i_ino, __entry->fl_blocker, __entry->fl_owner, show_fl_flags(__entry->fl_flags), show_fl_type(__entry->fl_type), __entry->fl_break_time, __entry->fl_downgrade_time) From patchwork Thu Aug 9 02:04:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10560695 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2ED9E14E5 for ; Thu, 9 Aug 2018 02:07:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1200E29EC6 for ; Thu, 9 Aug 2018 02:07:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 03D552AC8E; Thu, 9 Aug 2018 02:07:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8A9FA2AC8A for ; Thu, 9 Aug 2018 02:07:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727623AbeHIEaF (ORCPT ); Thu, 9 Aug 2018 00:30:05 -0400 Received: from mx2.suse.de ([195.135.220.15]:60492 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725757AbeHIEaF (ORCPT ); Thu, 9 Aug 2018 00:30:05 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 71A68AE06; Thu, 9 Aug 2018 02:07:40 +0000 (UTC) From: NeilBrown To: Jeff Layton , Alexander Viro Date: Thu, 09 Aug 2018 12:04:41 +1000 Subject: [PATCH 2/5] fs/locks: allow a lock request to block other requests. Cc: "J. Bruce Fields" , Martin Wilck , linux-fsdevel@vger.kernel.org, Frank Filz , linux-kernel@vger.kernel.org Message-ID: <153378028110.1220.8149340480666080233.stgit@noble> In-Reply-To: <153378012255.1220.6754153662007899557.stgit@noble> References: <153378012255.1220.6754153662007899557.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently, a lock can block pending requests, but all pending requests are equal. If lots of pending requests are mutually exclusive, this means they will all be woken up and all but one will fail. This can hurt performance. So we will allow pending requests to block other requests. Only the first request will be woken, and it will wake the others. This patch doesn't implement this fully, but prepares the way. - It acknowledges that a request might be blocking other requests, and when the request is converted to a lock, those blocked requests are moved across. - When a request is removed, all blocked requests are woken. - When deadlock-detection looks for the lock which blocks a given request, we follow the chain of ->fl_blocker all the way to the top. Signed-off-by: NeilBrown --- fs/locks.c | 58 ++++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 40 insertions(+), 18 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 322491e70e41..b4812da2a374 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -408,9 +408,19 @@ void locks_copy_lock(struct file_lock *new, struct file_lock *fl) fl->fl_ops->fl_copy_lock(new, fl); } } - EXPORT_SYMBOL(locks_copy_lock); +static void locks_move_blocks(struct file_lock *new, struct file_lock *fl) +{ + struct file_lock *f; + + spin_lock(&blocked_lock_lock); + list_splice_init(&fl->fl_blocked, &new->fl_blocked); + list_for_each_entry(f, &fl->fl_blocked, fl_block) + f->fl_blocker = new; + spin_unlock(&blocked_lock_lock); +} + static inline int flock_translate_cmd(int cmd) { if (cmd & LOCK_MAND) return cmd & (LOCK_MAND | LOCK_RW); @@ -681,9 +691,25 @@ static void __locks_delete_block(struct file_lock *waiter) waiter->fl_blocker = NULL; } +static void __locks_wake_up_blocks(struct file_lock *blocker) +{ + while (!list_empty(&blocker->fl_blocked)) { + struct file_lock *waiter; + + waiter = list_first_entry(&blocker->fl_blocked, + struct file_lock, fl_block); + __locks_delete_block(waiter); + if (waiter->fl_lmops && waiter->fl_lmops->lm_notify) + waiter->fl_lmops->lm_notify(waiter); + else + wake_up(&waiter->fl_wait); + } +} + static void locks_delete_block(struct file_lock *waiter) { spin_lock(&blocked_lock_lock); + __locks_wake_up_blocks(waiter); __locks_delete_block(waiter); spin_unlock(&blocked_lock_lock); } @@ -735,17 +761,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker) return; spin_lock(&blocked_lock_lock); - while (!list_empty(&blocker->fl_blocked)) { - struct file_lock *waiter; - - waiter = list_first_entry(&blocker->fl_blocked, - struct file_lock, fl_block); - __locks_delete_block(waiter); - if (waiter->fl_lmops && waiter->fl_lmops->lm_notify) - waiter->fl_lmops->lm_notify(waiter); - else - wake_up(&waiter->fl_wait); - } + __locks_wake_up_blocks(blocker); spin_unlock(&blocked_lock_lock); } @@ -888,8 +904,11 @@ static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl) struct file_lock *fl; hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) { - if (posix_same_owner(fl, block_fl)) - return fl->fl_blocker; + if (posix_same_owner(fl, block_fl)) { + while (fl->fl_blocker) + fl = fl->fl_blocker; + return fl; + } } return NULL; } @@ -982,6 +1001,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request) if (request->fl_flags & FL_ACCESS) goto out; locks_copy_lock(new_fl, request); + locks_move_blocks(new_fl, request); locks_insert_lock_ctx(new_fl, &ctx->flc_flock); new_fl = NULL; error = 0; @@ -1174,6 +1194,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, goto out; } locks_copy_lock(new_fl, request); + locks_move_blocks(new_fl, request); locks_insert_lock_ctx(new_fl, &fl->fl_list); fl = new_fl; new_fl = NULL; @@ -2582,13 +2603,14 @@ void locks_remove_file(struct file *filp) int posix_unblock_lock(struct file_lock *waiter) { - int status = 0; + int status = -ENOENT; spin_lock(&blocked_lock_lock); - if (waiter->fl_blocker) + if (waiter->fl_blocker) { + __locks_wake_up_blocks(waiter); __locks_delete_block(waiter); - else - status = -ENOENT; + status = 0; + } spin_unlock(&blocked_lock_lock); return status; } From patchwork Thu Aug 9 02:04:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10560699 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CBE0C15A6 for ; Thu, 9 Aug 2018 02:08:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B28112AC95 for ; Thu, 9 Aug 2018 02:08:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A6F9A2AC97; Thu, 9 Aug 2018 02:08:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 392592AC95 for ; Thu, 9 Aug 2018 02:08:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727704AbeHIEaN (ORCPT ); Thu, 9 Aug 2018 00:30:13 -0400 Received: from mx2.suse.de ([195.135.220.15]:60536 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725757AbeHIEaL (ORCPT ); Thu, 9 Aug 2018 00:30:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 7B5C2AECC; Thu, 9 Aug 2018 02:07:47 +0000 (UTC) From: NeilBrown To: Jeff Layton , Alexander Viro Date: Thu, 09 Aug 2018 12:04:41 +1000 Subject: [PATCH 3/5] fs/locks: change all *_conflict() functions to return a new enum. Cc: "J. Bruce Fields" , Martin Wilck , linux-fsdevel@vger.kernel.org, Frank Filz , linux-kernel@vger.kernel.org Message-ID: <153378028114.1220.3708291796442871726.stgit@noble> In-Reply-To: <153378012255.1220.6754153662007899557.stgit@noble> References: <153378012255.1220.6754153662007899557.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In a future patch we will need to differentiate between conflicts that are "transitive" and those that aren't. A "transitive" conflict is defined as one where any lock that conflicts with the first (newly requested) lock would conflict with the existing lock. So change posix_locks_conflict(), flock_locks_conflict() (both currently returning int) and leases_conflict() (currently returning bool) to return "enum conflict". Add locks_transitive_overlap() to make it possible to compute the correct conflict for posix locks. The FL_NO_CONFLICT value is zero, so current code which only tests the truth value of these functions will still work the same way. And convert some return (foo); to return foo; Signed-off-by: NeilBrown --- fs/locks.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 49 insertions(+), 15 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index b4812da2a374..d06658b2dc7a 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -139,6 +139,16 @@ #define IS_OFDLCK(fl) (fl->fl_flags & FL_OFDLCK) #define IS_REMOTELCK(fl) (fl->fl_pid <= 0) +/* A transitive conflict is one where the first lock conflicts with + * the second lock, and any other lock that conflicts with the + * first lock, also conflicts with the second lock. + */ +enum conflict { + FL_NO_CONFLICT = 0, + FL_CONFLICT, + FL_TRANSITIVE_CONFLICT, +}; + static inline bool is_remote_lock(struct file *filp) { return likely(!(filp->f_path.dentry->d_sb->s_flags & SB_NOREMOTELOCK)); @@ -612,6 +622,15 @@ static inline int locks_overlap(struct file_lock *fl1, struct file_lock *fl2) (fl2->fl_end >= fl1->fl_start)); } +/* Check for transitive-overlap - true if any lock that overlaps + * the first lock must overlap the seconds + */ +static inline bool locks_transitive_overlap(struct file_lock *fl1, + struct file_lock *fl2) +{ + return (fl1->fl_start >= fl2->fl_start) && + (fl1->fl_end <= fl2->fl_end); +} /* * Check whether two locks have the same owner. */ @@ -793,47 +812,61 @@ locks_delete_lock_ctx(struct file_lock *fl, struct list_head *dispose) /* Determine if lock sys_fl blocks lock caller_fl. Common functionality * checks for shared/exclusive status of overlapping locks. */ -static int locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl) +static enum conflict locks_conflict(struct file_lock *caller_fl, + struct file_lock *sys_fl) { if (sys_fl->fl_type == F_WRLCK) - return 1; + return FL_TRANSITIVE_CONFLICT; if (caller_fl->fl_type == F_WRLCK) - return 1; - return 0; + return FL_CONFLICT; + return FL_NO_CONFLICT; } /* Determine if lock sys_fl blocks lock caller_fl. POSIX specific * checking before calling the locks_conflict(). */ -static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl) +static enum conflict posix_locks_conflict(struct file_lock *caller_fl, + struct file_lock *sys_fl) { /* POSIX locks owned by the same process do not conflict with * each other. */ if (posix_same_owner(caller_fl, sys_fl)) - return (0); + return FL_NO_CONFLICT; /* Check whether they overlap */ if (!locks_overlap(caller_fl, sys_fl)) - return 0; + return FL_NO_CONFLICT; - return (locks_conflict(caller_fl, sys_fl)); + switch (locks_conflict(caller_fl, sys_fl)) { + default: + case FL_NO_CONFLICT: + return FL_NO_CONFLICT; + case FL_CONFLICT: + return FL_CONFLICT; + case FL_TRANSITIVE_CONFLICT: + if (locks_transitive_overlap(caller_fl, sys_fl)) + return FL_TRANSITIVE_CONFLICT; + else + return FL_CONFLICT; + } } /* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific * checking before calling the locks_conflict(). */ -static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock *sys_fl) +static enum conflict flock_locks_conflict(struct file_lock *caller_fl, + struct file_lock *sys_fl) { /* FLOCK locks referring to the same filp do not conflict with * each other. */ if (caller_fl->fl_file == sys_fl->fl_file) - return (0); + return FL_NO_CONFLICT; if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND)) - return 0; + return FL_NO_CONFLICT; - return (locks_conflict(caller_fl, sys_fl)); + return locks_conflict(caller_fl, sys_fl); } void @@ -1435,12 +1468,13 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose) } } -static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker) +static enum conflict leases_conflict(struct file_lock *lease, + struct file_lock *breaker) { if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT)) - return false; + return FL_NO_CONFLICT; if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE)) - return false; + return FL_NO_CONFLICT; return locks_conflict(breaker, lease); } From patchwork Thu Aug 9 02:04:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10560697 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1214E14E5 for ; Thu, 9 Aug 2018 02:08:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ED1C42AC8F for ; Thu, 9 Aug 2018 02:07:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E15672AC95; Thu, 9 Aug 2018 02:07:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9B4642AC8F for ; Thu, 9 Aug 2018 02:07:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727805AbeHIEaS (ORCPT ); Thu, 9 Aug 2018 00:30:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:60590 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725757AbeHIEaS (ORCPT ); Thu, 9 Aug 2018 00:30:18 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 3A05AACE8; Thu, 9 Aug 2018 02:07:54 +0000 (UTC) From: NeilBrown To: Jeff Layton , Alexander Viro Date: Thu, 09 Aug 2018 12:04:41 +1000 Subject: [PATCH 4/5] fs/locks: split out __locks_wake_one() Cc: "J. Bruce Fields" , Martin Wilck , linux-fsdevel@vger.kernel.org, Frank Filz , linux-kernel@vger.kernel.org Message-ID: <153378028117.1220.6771440459821567385.stgit@noble> In-Reply-To: <153378012255.1220.6754153662007899557.stgit@noble> References: <153378012255.1220.6754153662007899557.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This functionality will be useful in the next patch, so split it out from __locks_wake_up_blocks(). Signed-off-by: NeilBrown --- fs/locks.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index d06658b2dc7a..fc64016d01ee 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -710,6 +710,15 @@ static void __locks_delete_block(struct file_lock *waiter) waiter->fl_blocker = NULL; } +static void __locks_wake_one(struct file_lock *waiter) +{ + __locks_delete_block(waiter); + if (waiter->fl_lmops && waiter->fl_lmops->lm_notify) + waiter->fl_lmops->lm_notify(waiter); + else + wake_up(&waiter->fl_wait); +} + static void __locks_wake_up_blocks(struct file_lock *blocker) { while (!list_empty(&blocker->fl_blocked)) { @@ -717,11 +726,7 @@ static void __locks_wake_up_blocks(struct file_lock *blocker) waiter = list_first_entry(&blocker->fl_blocked, struct file_lock, fl_block); - __locks_delete_block(waiter); - if (waiter->fl_lmops && waiter->fl_lmops->lm_notify) - waiter->fl_lmops->lm_notify(waiter); - else - wake_up(&waiter->fl_wait); + __locks_wake_one(waiter); } } From patchwork Thu Aug 9 02:04:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10560701 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2601E14E5 for ; Thu, 9 Aug 2018 02:08:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C7C02AC98 for ; Thu, 9 Aug 2018 02:08:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 009142AC9E; Thu, 9 Aug 2018 02:08:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75C482AC98 for ; Thu, 9 Aug 2018 02:08:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727830AbeHIEaZ (ORCPT ); Thu, 9 Aug 2018 00:30:25 -0400 Received: from mx2.suse.de ([195.135.220.15]:60652 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725757AbeHIEaZ (ORCPT ); Thu, 9 Aug 2018 00:30:25 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 05A8DAB99; Thu, 9 Aug 2018 02:08:01 +0000 (UTC) From: NeilBrown To: Jeff Layton , Alexander Viro Date: Thu, 09 Aug 2018 12:04:41 +1000 Subject: [PATCH 5/5] fs/locks: create a tree of dependent requests. Cc: "J. Bruce Fields" , Martin Wilck , linux-fsdevel@vger.kernel.org, Frank Filz , linux-kernel@vger.kernel.org Message-ID: <153378028121.1220.4418653283078446336.stgit@noble> In-Reply-To: <153378012255.1220.6754153662007899557.stgit@noble> References: <153378012255.1220.6754153662007899557.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When we find an existing lock which conflicts with a request, and the request wants to wait, we currently add the request to a list. When the lock is removed, the whole list is woken. This can cause the thundering-herd problem. To reduce the problem, we make use of the (new) fact that a pending request can itself have a list of blocked requests. When we find a conflict, we look through the existing blocked requests. If any one of them blocks the new request, the new request is attached below that request. This way, when the lock is released, only a set of non-conflicting locks will be woken. The rest of the herd can stay asleep. Reported-and-tested-by: Martin Wilck Signed-off-by: NeilBrown --- fs/locks.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 63 insertions(+), 6 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index fc64016d01ee..17843feb6f5b 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -738,6 +738,39 @@ static void locks_delete_block(struct file_lock *waiter) spin_unlock(&blocked_lock_lock); } +static void wake_non_conflicts(struct file_lock *waiter, struct file_lock *blocker, + enum conflict conflict(struct file_lock *, + struct file_lock *)) +{ + struct file_lock *parent = waiter; + struct file_lock *fl; + struct file_lock *t; + + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); +restart: + list_for_each_entry_safe_continue(fl, t, &parent->fl_blocked, fl_block) { + switch (conflict(fl, blocker)) { + default: + case FL_NO_CONFLICT: + __locks_wake_one(fl); + break; + case FL_CONFLICT: + /* Need to check children */ + parent = fl; + fl = list_entry(&parent->fl_blocked, struct file_lock, fl_block); + goto restart; + case FL_TRANSITIVE_CONFLICT: + /* all children must also conflict, no need to check */ + continue; + } + } + if (parent != waiter) { + parent = parent->fl_blocker; + fl = parent; + goto restart; + } +} + /* Insert waiter into blocker's block list. * We use a circular list so that processes can be easily woken up in * the order they blocked. The documentation doesn't require this but @@ -747,11 +780,32 @@ static void locks_delete_block(struct file_lock *waiter) * fl_blocked list itself is protected by the blocked_lock_lock, but by ensuring * that the flc_lock is also held on insertions we can avoid taking the * blocked_lock_lock in some cases when we see that the fl_blocked list is empty. + * + * Rather than just adding to the list, we check for conflicts with any existing + * waiter, and add to that waiter instead. + * Thus wakeups don't happen until needed. */ static void __locks_insert_block(struct file_lock *blocker, - struct file_lock *waiter) + struct file_lock *waiter, + enum conflict conflict(struct file_lock *, + struct file_lock *)) { + struct file_lock *fl; BUG_ON(!list_empty(&waiter->fl_block)); + + /* Any request in waiter->fl_blocked is know to conflict with + * waiter, but it might not conflict with blocker. + * If it doesn't, it needs to be woken now so it can find + * somewhere else to wait, or possible it can get granted. + */ + if (conflict(waiter, blocker) != FL_TRANSITIVE_CONFLICT) + wake_non_conflicts(waiter, blocker, conflict); +new_blocker: + list_for_each_entry(fl, &blocker->fl_blocked, fl_block) + if (conflict(fl, waiter)) { + blocker = fl; + goto new_blocker; + } waiter->fl_blocker = blocker; list_add_tail(&waiter->fl_block, &blocker->fl_blocked); if (IS_POSIX(blocker) && !IS_OFDLCK(blocker)) @@ -760,10 +814,12 @@ static void __locks_insert_block(struct file_lock *blocker, /* Must be called with flc_lock held. */ static void locks_insert_block(struct file_lock *blocker, - struct file_lock *waiter) + struct file_lock *waiter, + enum conflict conflict(struct file_lock *, + struct file_lock *)) { spin_lock(&blocked_lock_lock); - __locks_insert_block(blocker, waiter); + __locks_insert_block(blocker, waiter, conflict); spin_unlock(&blocked_lock_lock); } @@ -1033,7 +1089,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request) if (!(request->fl_flags & FL_SLEEP)) goto out; error = FILE_LOCK_DEFERRED; - locks_insert_block(fl, request); + locks_insert_block(fl, request, flock_locks_conflict); goto out; } if (request->fl_flags & FL_ACCESS) @@ -1107,7 +1163,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, spin_lock(&blocked_lock_lock); if (likely(!posix_locks_deadlock(request, fl))) { error = FILE_LOCK_DEFERRED; - __locks_insert_block(fl, request); + __locks_insert_block(fl, request, + posix_locks_conflict); } spin_unlock(&blocked_lock_lock); goto out; @@ -1581,7 +1638,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) break_time -= jiffies; if (break_time == 0) break_time++; - locks_insert_block(fl, new_fl); + locks_insert_block(fl, new_fl, leases_conflict); trace_break_lease_block(inode, new_fl); spin_unlock(&ctx->flc_lock); percpu_up_read_preempt_enable(&file_rwsem);