net: core: Correct the sock::sk_lock.owned lockdep annotations

lock_sock_fast() and lock_sock_nested() contain lockdep annotations for the
sock::sk_lock.owned 'mutex'. sock::sk_lock.owned is not a regular mutex. It
is just lockdep wise equivalent. In fact it's an open coded trivial mutex
implementation with some interesting features.

sock::sk_lock.slock is a regular spinlock protecting the 'mutex'
representation sock::sk_lock.owned which is a plain boolean. If 'owned' is
true, then some other task holds the 'mutex', otherwise it is uncontended.
As this locking construct is obviously endangered by lock ordering issues as
any other locking primitive it got lockdep annotated via a dedicated
dependency map sock::sk_lock.dep_map which has to be updated at the lock
and unlock sites.

lock_sock_nested() is a straight forward 'mutex' lock operation:

  might_sleep();
  spin_lock_bh(sock::sk_lock.slock)
  while (!try_lock(sock::sk_lock.owned)) {
      spin_unlock_bh(sock::sk_lock.slock);
      wait_for_release();
      spin_lock_bh(sock::sk_lock.slock);
  }

The lockdep annotation for sock::sk_lock.owned is for unknown reasons
_after_ the lock has been acquired, i.e. after the code block above and
after releasing sock::sk_lock.slock, but inside the bottom halves disabled
region:

  spin_unlock(sock::sk_lock.slock);
  mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_);
  local_bh_enable();

The placement after the unlock is obvious because otherwise the
mutex_acquire() would nest into the spin lock held region.

But that's from the lockdep perspective still the wrong place:

 1) The mutex_acquire() is issued _after_ the successful acquisition which
    is pointless because in a dead lock scenario this point is never
    reached which means that if the deadlock is the first instance of
    exposing the wrong lock order lockdep does not have a chance to detect
    it.

 2) It only works because lockdep is rather lax on the context from which
    the mutex_acquire() is issued. Acquiring a mutex inside a bottom halves
    and therefore non-preemptible region is obviously invalid, except for a
    trylock which is clearly not the case here.

    This 'works' stops working on RT enabled kernels where the bottom halves
    serialization is done via a local lock, which exposes this misplacement
    because the 'mutex' and the local lock nest the wrong way around and
    lockdep complains rightfully about a lock inversion.

The placement is wrong since the initial commit a5b5bb9a053a ("[PATCH]
lockdep: annotate sk_locks") which introduced this.

Fix it by moving the mutex_acquire() in front of the actual lock
acquisition, which is what the regular mutex_lock() operation does as well.

lock_sock_fast() is not that straight forward. It looks at the first glance
like a convoluted trylock operation:

  spin_lock_bh(sock::sk_lock.slock)
  if (!sock::sk_lock.owned)
      return false;
  while (!try_lock(sock::sk_lock.owned)) {
      spin_unlock_bh(sock::sk_lock.slock);
      wait_for_release();
      spin_lock_bh(sock::sk_lock.slock);
  }
  spin_unlock(sock::sk_lock.slock);
  mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_);
  local_bh_enable();
  return true;

But that's not the case: lock_sock_fast() is an interesting optimization
for short critical sections which can run with bottom halves disabled and
sock::sk_lock.slock held. This allows to shortcut the 'mutex' operation in
the non contended case by preventing other lockers to acquire
sock::sk_lock.owned because they are blocked on sock::sk_lock.slock, which
in turn avoids the overhead of doing the heavy processing in release_sock()
including waking up wait queue waiters.

In the contended case, i.e. when sock::sk_lock.owned == true the behavior
is the same as lock_sock_nested().

Semantically this shortcut means, that the task acquired the 'mutex' even
if it does not touch the sock::sk_lock.owned field in the non-contended
case. Not telling lockdep about this shortcut acquisition is hiding
potential lock ordering violations in the fast path.

As a consequence the same reasoning as for the above lock_sock_nested()
case vs. the placement of the lockdep annotation applies. 

The current placement of the lockdep annotation was just copied from
the original lock_sock(), now renamed to lock_sock_nested(),
implementation.

Fix this by moving the mutex_acquire() in front of the actual lock
acquisition and adding the corresponding mutex_release() into
unlock_sock_fast(). Also document the fast path return case with a comment.

Reported-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
---

The basic network testing I was able to do did not expose any lockdep
complaints and as the probability that a potential lock order violation is
hidden by the fact that the slowpath in lock_sock_fast() is never
taken is low, I'm not expecting to see much fallout of this.

---
 include/net/sock.h |    1 +
 net/core/sock.c    |   37 +++++++++++++++++++++++--------------
 2 files changed, 24 insertions(+), 14 deletions(-)
---

Message ID	20210918114626.399467843@linutronix.de (mailing list archive)
State	Accepted
Commit	2dcb96bacce36021c2f3eaae0cef607b5bb71ede
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1CF9C433EF for <netdev@archiver.kernel.org>; Sat, 18 Sep 2021 12:42:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9E00561284 for <netdev@archiver.kernel.org>; Sat, 18 Sep 2021 12:42:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236247AbhIRMoB (ORCPT <rfc822;netdev@archiver.kernel.org>); Sat, 18 Sep 2021 08:44:01 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:60200 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232440AbhIRMoB (ORCPT <rfc822;netdev@vger.kernel.org>); Sat, 18 Sep 2021 08:44:01 -0400 Message-ID: <20210918114626.399467843@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1631968956; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=J2RutnZvEXhSJ6T/g1cOHESFLEKUpYsSz4YSNZ3M8Rw=; b=bJQVMWPEJZQzx5OUwGwdRiUKc1XSnnVTpJ+BUP4CclD9Mm55SuS37iE4vb0kfk2lgJyO8B rtv/F5NR4BTQ68dkgzQMHmUADyqPqWs7kkPZJlqTUdgGpRW+IG8yXXyvywy0DY+xpTCHdi nFsBK9IkctovqsEKcCZub8w2Qgok6Gk5L4nsuiPpBkwBYiEuNfA+tN/X+uNi6fSM4fAkBp 5fE2uCmQbiIS1XIE4rRbPCbgrohf7BR4d6+VMe+uv5u3WmBGD2cdEzqTdimLkwfMzAJA8c pUTno8Kq1oqp0srN1Gy+lniu1JHzfUNrSiDchioWPCAxYfUHidc1WT1Zlfl0IA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1631968956; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=J2RutnZvEXhSJ6T/g1cOHESFLEKUpYsSz4YSNZ3M8Rw=; b=eboITd1In7x0cYr0ROqPphLW3St7A9ODsDuVU4zBikdGAw7ICFDF31QCWEMCIN4wU2iP6i Vp8FxS8awv73PKBA== From: Thomas Gleixner <tglx@linutronix.de> To: netdev@vger.kernel.org Cc: LKML <linux-kernel@vger.kernel.org>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Eric Dumazet <edumazet@google.com>, Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Sebastian Siewior <bigeasy@linutronix.de> Subject: [patch] net: core: Correct the sock::sk_lock.owned lockdep annotations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Date: Sat, 18 Sep 2021 14:42:35 +0200 (CEST) Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org
Series	net: core: Correct the sock::sk_lock.owned lockdep annotations \| expand net: core: Correct the sock::sk_lock.owned lockdep annotations

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	success	Link
netdev/patch_count	success	Link
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/subject_prefix	warning	Target tree name not specified in the subject
netdev/cc_maintainers	warning	7 maintainers not CCed: pabeni@redhat.com aahringo@redhat.com fw@strlen.de xiangxia.m.yue@gmail.com bjorn@kernel.org mathew.j.martineau@linux.intel.com yangbo.lu@nxp.com
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 3094 this patch: 3094
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 71 lines checked
netdev/build_allmodconfig_warn	success	Errors and warnings before: 3204 this patch: 3204
netdev/header_inline	success	Link

net: core: Correct the sock::sk_lock.owned lockdep annotations

Checks

Commit Message

Comments

Patch