From patchwork Thu Jul 31 10:16:37 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 4654661 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 6F09F9F32F for ; Thu, 31 Jul 2014 10:17:07 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 94DFB201BF for ; Thu, 31 Jul 2014 10:17:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AF0C72017A for ; Thu, 31 Jul 2014 10:17:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756308AbaGaKQo (ORCPT ); Thu, 31 Jul 2014 06:16:44 -0400 Received: from mail-la0-f48.google.com ([209.85.215.48]:57950 "EHLO mail-la0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756291AbaGaKQm (ORCPT ); Thu, 31 Jul 2014 06:16:42 -0400 Received: by mail-la0-f48.google.com with SMTP id gl10so1841425lab.21 for ; Thu, 31 Jul 2014 03:16:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=EZbJaoY6TyR5G216o4QlHTMf9XzcufzcBdFADehDtvU=; b=ZQDfbEoLHhI3UnbFuEnoaulQ9PfsTtM0Ib0e96qtEMEegaGctVsM/8Z4dZysCd62Os yuBBo3Et51RwNWOOmGWdoyVqYllVZZJdrR7JlklrrKee9fLT3iylmpbpdRhxi47NUs6y nT7LHf5DUb7g3cjqYijdrvJ4yYLh9RCRjFCG/4qqU2gAPLaMDf0Y0C+jQ9enEMpDcLvf mZnoMgnOQEZEhUCB2NgXN6kOK7SKQJaanZ3NPQFtFGcZ3jWzateDfC9qKHfxpO31HBuG PeE2rIpPrQqZHEe9Ylp0+sw+IQMj8uXHcBQutelQIp3R9szEmOheeJddSeHRcPCJ8hnN g7nw== X-Gm-Message-State: ALoCoQmsFpIrL6Wwh3ZD2Y91I6qzgYfLsefcA+rI5CoodgjtfSIYlHu0jkPD0WeHhSB/aIcFnHeH X-Received: by 10.152.30.100 with SMTP id r4mr10751621lah.87.1406801800335; Thu, 31 Jul 2014 03:16:40 -0700 (PDT) Received: from localhost ([109.110.67.1]) by mx.google.com with ESMTPSA id qv5sm7661436lbb.19.2014.07.31.03.16.38 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 31 Jul 2014 03:16:39 -0700 (PDT) From: Ilya Dryomov To: linux-kernel@vger.kernel.org Cc: Peter Zijlstra , Ingo Molnar , ceph-devel@vger.kernel.org Subject: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra reschedule point" Date: Thu, 31 Jul 2014 14:16:37 +0400 Message-Id: <1406801797-20139-1-git-send-email-ilya.dryomov@inktank.com> X-Mailer: git-send-email 1.7.10.4 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This reverts commit 34c6bc2c919a55e5ad4e698510a2f35ee13ab900. This commit can lead to deadlocks by way of what at a high level appears to look like a missing wakeup on mutex_unlock() when CONFIG_MUTEX_SPIN_ON_OWNER is set, which is how most distributions ship their kernels. In particular, it causes reproducible deadlocks in libceph/rbd code under higher than moderate loads with the evidence actually pointing to the bowels of mutex_lock(). kernel/locking/mutex.c, __mutex_lock_common(): 476 osq_unlock(&lock->osq); 477 slowpath: 478 /* 479 * If we fell out of the spin path because of need_resched(), 480 * reschedule now, before we try-lock the mutex. This avoids getting 481 * scheduled out right after we obtained the mutex. 482 */ 483 if (need_resched()) 484 schedule_preempt_disabled(); <-- never returns 485 #endif 486 spin_lock_mutex(&lock->wait_lock, flags); We started bumping into deadlocks in QA the day our branch has been rebased onto 3.15 (the release this commit went in) but then as part of debugging effort I enabled all locking debug options, which also disabled CONFIG_MUTEX_SPIN_ON_OWNER and made everything disappear, which is why it hasn't been looked into until now. Revert makes the problem go away, confirmed by our users. Cc: Peter Zijlstra Cc: stable@vger.kernel.org # 3.15 Signed-off-by: Ilya Dryomov --- kernel/locking/mutex.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index acca2c1a3c5e..746ff280a2fc 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -475,13 +475,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, } osq_unlock(&lock->osq); slowpath: - /* - * If we fell out of the spin path because of need_resched(), - * reschedule now, before we try-lock the mutex. This avoids getting - * scheduled out right after we obtained the mutex. - */ - if (need_resched()) - schedule_preempt_disabled(); #endif spin_lock_mutex(&lock->wait_lock, flags);