diff mbox

mds: remove waiting lock before merging with neighbours

Message ID 1375110344-26133-1-git-send-email-ddiss@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

David Disseldorp July 29, 2013, 3:05 p.m. UTC
CephFS currently deadlocks under CTDB's ping_pong POSIX locking test
when run concurrently on multiple nodes.
The deadlock is caused by failed removal of a waiting_locks entry when
the waiting lock is merged with an existing lock, e.g:

Initial MDS state (two clients, same file):
held_locks -- start: 0, length: 1, client: 4116, pid: 7899, type: 2
	      start: 2, length: 1, client: 4110, pid: 40767, type: 2
waiting_locks -- start: 1, length: 1, client: 4116, pid: 7899, type: 2

Waiting lock entry 4116@1:1 fires:
handle_client_file_setlock: start: 1, length: 1,
			    client: 4116, pid: 7899, type: 2

MDS state after lock is obtained:
held_locks -- start: 0, length: 2, client: 4116, pid: 7899, type: 2
	      start: 2, length: 1, client: 4110, pid: 40767, type: 2
waiting_locks -- start: 1, length: 1, client: 4116, pid: 7899, type: 2

Note that the waiting 4116@1:1 lock entry is merged with the existing
4116@0:1 held lock to become a 4116@0:2 held lock. However, the now
handled 4116@1:1 waiting_locks entry remains.

When handling a lock request, the MDS calls adjust_locks() to merge
the new lock with available neighbours. If the new lock is merged,
then the waiting_locks entry is not located in the subsequent
remove_waiting() call.
This fix ensures that the waiting_locks entry is removed prior to
modification during merge.

Signed-off-by: David Disseldorp <ddiss@suse.de>
---
 src/mds/flock.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

David Disseldorp Aug. 1, 2013, 12:11 p.m. UTC | #1
Hi,

Did anyone get a chance to look at this change?
Any comments/feedback/ridicule would be appreciated.

Cheers, David
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sage Weil Aug. 1, 2013, 6:07 p.m. UTC | #2
On Thu, 1 Aug 2013, David Disseldorp wrote:
> Hi,
> 
> Did anyone get a chance to look at this change?
> Any comments/feedback/ridicule would be appreciated.

Sorry, not yet--and Greg just headed out for vacation yesterday.  It's on 
my list to look at when I have some time tonight or tomorrow, though. 
Thanks!  

I'm hopefully this will clear up some of the locking hangs we've seen with 
the samba and flock tests...

sage


> 
> Cheers, David
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gregory Farnum Aug. 23, 2013, 8:58 p.m. UTC | #3
Hi David,
I'm really sorry it took us so long to get back to you on this. :(
However, I've reviewed the patch and, apart from going over the code
making me want to strangle myself for structuring it that way,
everything looks good. I changed the last paragraph in the commit
message very slightly to clarify the cause of the bug:

On Mon, Jul 29, 2013 at 8:05 AM, David Disseldorp <ddiss@suse.de> wrote:
> When handling a lock request, the MDS calls adjust_locks() to merge
> the new lock with available neighbours. If the new lock is merged,
> then the waiting_locks entry is not located in the subsequent
> remove_waiting() call.
> This fix ensures that the waiting_locks entry is removed prior to
> modification during merge.

    When handling a lock request, the MDS calls adjust_locks() to merge
    the new lock with available neighbours. If the new lock is merged,
    then the waiting_locks entry is not located in the subsequent
    remove_waiting() call because adjust_locks changed the new lock to
    include the old locks.
    This fix ensures that the waiting_locks entry is removed prior to
    modification during merge.

And it's now merged into master and backported to dumpling. Thank you very much!

If you feel like cleaning up the locking code a little more (or
anything else, for that matter) I can promise you faster reviews in
the future... ;)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Disseldorp Aug. 24, 2013, 12:02 p.m. UTC | #4
On Fri, 23 Aug 2013 13:58:56 -0700
Gregory Farnum <greg@inktank.com> wrote:

>     When handling a lock request, the MDS calls adjust_locks() to merge
>     the new lock with available neighbours. If the new lock is merged,
>     then the waiting_locks entry is not located in the subsequent
>     remove_waiting() call because adjust_locks changed the new lock to
>     include the old locks.
>     This fix ensures that the waiting_locks entry is removed prior to
>     modification during merge.

Looks good.

> And it's now merged into master and backported to dumpling. Thank you very much!

Great, thanks Gregory.

Cheers, David
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/mds/flock.cc b/src/mds/flock.cc
index e83c5ee..5e329af 100644
--- a/src/mds/flock.cc
+++ b/src/mds/flock.cc
@@ -75,12 +75,14 @@  bool ceph_lock_state_t::add_lock(ceph_filelock& new_lock,
       } else {
         //yay, we can insert a shared lock
         dout(15) << "inserting shared lock" << dendl;
+        remove_waiting(new_lock);
         adjust_locks(self_overlapping_locks, new_lock, neighbor_locks);
         held_locks.insert(pair<uint64_t, ceph_filelock>(new_lock.start, new_lock));
         ret = true;
       }
     }
   } else { //no overlapping locks except our own
+    remove_waiting(new_lock);
     adjust_locks(self_overlapping_locks, new_lock, neighbor_locks);
     dout(15) << "no conflicts, inserting " << new_lock << dendl;
     held_locks.insert(pair<uint64_t, ceph_filelock>
@@ -89,7 +91,6 @@  bool ceph_lock_state_t::add_lock(ceph_filelock& new_lock,
   }
   if (ret) {
     ++client_held_lock_counts[(client_t)new_lock.client];
-    remove_waiting(new_lock);
   }
   else if (wait_on_fail && !replay)
     ++client_waiting_lock_counts[(client_t)new_lock.client];
@@ -306,7 +307,7 @@  void ceph_lock_state_t::adjust_locks(list<multimap<uint64_t, ceph_filelock>::ite
     old_lock = &(*iter)->second;
     old_lock_client = old_lock->client;
     dout(15) << "lock to coalesce: " << *old_lock << dendl;
-    /* because if it's a neibhoring lock there can't be any self-overlapping
+    /* because if it's a neighboring lock there can't be any self-overlapping
        locks that covered it */
     if (old_lock->type == new_lock.type) { //merge them
       if (0 == new_lock.length) {