From patchwork Thu Sep 8 07:46:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juergen Gross X-Patchwork-Id: 12969718 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 10BE5C38145 for ; Thu, 8 Sep 2022 07:46:42 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.402777.644650 (Exim 4.92) (envelope-from ) id 1oWCEy-0004tV-W2; Thu, 08 Sep 2022 07:46:24 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 402777.644650; Thu, 08 Sep 2022 07:46:24 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1oWCEy-0004tO-Sp; Thu, 08 Sep 2022 07:46:24 +0000 Received: by outflank-mailman (input) for mailman id 402777; Thu, 08 Sep 2022 07:46:23 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1oWCEx-0004tG-10 for xen-devel@lists.xenproject.org; Thu, 08 Sep 2022 07:46:23 +0000 Received: from smtp-out1.suse.de (smtp-out1.suse.de [2001:67c:2178:6::1c]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 54bae50d-2f4a-11ed-9760-273f2230c3a0; Thu, 08 Sep 2022 09:46:21 +0200 (CEST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2D26133BAA; Thu, 8 Sep 2022 07:46:21 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E409D13A6D; Thu, 8 Sep 2022 07:46:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id B6dXNsydGWMjGgAAMHmgww (envelope-from ); Thu, 08 Sep 2022 07:46:20 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 54bae50d-2f4a-11ed-9760-273f2230c3a0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1662623181; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=8stUmslfXwqQ4rY4yqV0K4807cpifhdTUVRx/PmR6RU=; b=KiDgB6yhXX6BM0cM8WgB8fRfFB4c64hPj1AeCmzqf38RY6LayY7Vfv7IBcLs7T+tPrE9mQ NzZ01+IsA0lJ9GUB7KQi/D4YpyHLGJgRZjjddAUvV8Sf/3HgNwMem7iE9fa+ycWKaOkK5V MmBFuYT8UirtW6yHiThkok9/IETdO1A= From: Juergen Gross To: xen-devel@lists.xenproject.org Cc: Juergen Gross , Andrew Cooper , George Dunlap , Jan Beulich , Julien Grall , Stefano Stabellini , Wei Liu Subject: [PATCH] xen/locks: add dynamic lock recursion checks Date: Thu, 8 Sep 2022 09:46:19 +0200 Message-Id: <20220908074619.18221-1-jgross@suse.com> X-Mailer: git-send-email 2.35.3 MIME-Version: 1.0 Add checking of lock recursion to the hypervisor. This is done by using a percpu data array for storing the address of each taken lock. Any attempt to take a lock twice (with the exception of recursive spinlocks) will result in a crash. This is especially meant for detecting attempts to take a rwlock multiple times as a reader, which will only result in a deadlock in case of another cpu trying to get the lock as a writer in between. The additional checks are not performance neutral, so they are enabled only in debug builds per default, as the checks are active only with CONFIG_DEBUG_LOCKS enabled. The size of the percpu data array can be selected via a boot parameter. Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich --- Further possible enhancements: - record address or file/line of locker, too - add sanity check when exiting hypervisor or going to idle that all locks have been released Signed-off-by: Juergen Gross --- docs/misc/xen-command-line.pandoc | 12 +++ xen/common/rwlock.c | 6 ++ xen/common/spinlock.c | 145 ++++++++++++++++++++++++++++++ xen/include/xen/rwlock.h | 20 +++++ xen/include/xen/spinlock.h | 4 + 5 files changed, 187 insertions(+) diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index 9a79385a37..7d228ad5a1 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -1631,6 +1631,18 @@ This option is intended for debugging purposes only. Enable MSR_DEBUGCTL.LBR in hypervisor context to be able to dump the Last Interrupt/Exception To/From record with other registers. +### lock-depth-size +> `= ` + +> Default: `lock-depth-size=64` + +Specifies the maximum number of nested locks tested for illegal recursions. +Higher nesting levels still work, but recursion testing is omitted for those +levels. In case an illegal recursion is detected the system will crash +immediately. Specifying `0` will disable all testing of illegal lock nesting. + +This option is available for hypervisors built with CONFIG_DEBUG_LOCKS only. + ### loglvl > `= [/]` where level is `none | error | warning | info | debug | all` diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c index aa15529bbe..18224a4bb5 100644 --- a/xen/common/rwlock.c +++ b/xen/common/rwlock.c @@ -54,6 +54,8 @@ void queue_read_lock_slowpath(rwlock_t *lock) * Signal the next one in queue to become queue head. */ spin_unlock(&lock->lock); + + lock_enter(&lock->lock.debug); } /* @@ -100,6 +102,8 @@ void queue_write_lock_slowpath(rwlock_t *lock) } unlock: spin_unlock(&lock->lock); + + lock_enter(&lock->lock.debug); } @@ -146,4 +150,6 @@ void _percpu_write_lock(percpu_rwlock_t **per_cpudata, /* Give the coherency fabric a break. */ cpu_relax(); }; + + lock_enter(&percpu_rwlock->rwlock.lock.debug); } diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c index 62c83aaa6a..84996c3fbc 100644 --- a/xen/common/spinlock.c +++ b/xen/common/spinlock.c @@ -1,5 +1,8 @@ +#include #include #include +#include +#include #include #include #include @@ -11,11 +14,77 @@ #ifdef CONFIG_DEBUG_LOCKS +/* Max. number of entries in locks_taken array. */ +static unsigned int __ro_after_init lock_depth_size = 64; +integer_param("lock-depth-size", lock_depth_size); + +/* + * Array of addresses of taken locks. + * nr_locks_taken is the index after the last entry. As locks tend to be + * nested cleanly, when freeing a lock it will probably be the one before + * nr_locks_taken, and new entries can be entered at that index. It is fine + * for a lock to be released out of order, though. + */ +static DEFINE_PER_CPU(const union lock_debug **, locks_taken); +static DEFINE_PER_CPU(unsigned int, nr_locks_taken); +static bool __read_mostly max_depth_reached; + static atomic_t spin_debug __read_mostly = ATOMIC_INIT(0); +static int cf_check cpu_lockdebug_callback(struct notifier_block *nfb, + unsigned long action, + void *hcpu) +{ + unsigned int cpu = (unsigned long)hcpu; + + switch ( action ) + { + case CPU_UP_PREPARE: + if ( !per_cpu(locks_taken, cpu) ) + per_cpu(locks_taken, cpu) = xzalloc_array(const union lock_debug *, + lock_depth_size); + if ( !per_cpu(locks_taken, cpu) ) + printk(XENLOG_WARNING + "cpu %u: failed to allocate lock recursion check area\n", + cpu); + break; + + case CPU_UP_CANCELED: + case CPU_DEAD: + XFREE(per_cpu(locks_taken, cpu)); + break; + + default: + break; + } + + return 0; +} + +static struct notifier_block cpu_lockdebug_nfb = { + .notifier_call = cpu_lockdebug_callback, +}; + +static int __init cf_check lockdebug_init(void) +{ + if ( lock_depth_size ) + { + register_cpu_notifier(&cpu_lockdebug_nfb); + cpu_lockdebug_callback(&cpu_lockdebug_nfb, CPU_UP_PREPARE, + (void *)(unsigned long)smp_processor_id()); + } + + return 0; +} +presmp_initcall(lockdebug_init); + void check_lock(union lock_debug *debug, bool try) { bool irq_safe = !local_irq_is_enabled(); + unsigned int cpu = smp_processor_id(); + const union lock_debug *const *taken = per_cpu(locks_taken, cpu); + unsigned int nr_taken = per_cpu(nr_locks_taken, cpu); + unsigned int i; BUILD_BUG_ON(LOCK_DEBUG_PAD_BITS <= 0); @@ -63,6 +132,16 @@ void check_lock(union lock_debug *debug, bool try) BUG(); } } + + if ( try ) + return; + + for ( i = 0; i < nr_taken; i++ ) + if ( taken[i] == debug ) + { + printk("CHECKLOCK FAILURE: lock at %p taken recursively\n", debug); + BUG(); + } } static void check_barrier(union lock_debug *debug) @@ -84,15 +163,81 @@ static void check_barrier(union lock_debug *debug) BUG_ON(!local_irq_is_enabled() && !debug->irq_safe); } +void lock_enter(const union lock_debug *debug) +{ + unsigned int cpu = smp_processor_id(); + const union lock_debug **taken = per_cpu(locks_taken, cpu); + unsigned int *nr_taken = &per_cpu(nr_locks_taken, cpu); + unsigned long flags; + + if ( !taken ) + return; + + local_irq_save(flags); + + if ( *nr_taken < lock_depth_size ) + taken[(*nr_taken)++] = debug; + else if ( !max_depth_reached ) + { + max_depth_reached = true; + printk("CHECKLOCK max lock depth %u reached!\n", lock_depth_size); + WARN(); + } + + local_irq_restore(flags); +} + +void lock_exit(const union lock_debug *debug) +{ + unsigned int cpu = smp_processor_id(); + const union lock_debug **taken = per_cpu(locks_taken, cpu); + unsigned int *nr_taken = &per_cpu(nr_locks_taken, cpu); + unsigned int i; + unsigned long flags; + + if ( !taken ) + return; + + local_irq_save(flags); + + for ( i = *nr_taken; i > 0; i-- ) + { + if ( taken[i - 1] == debug ) + { + memmove(taken + i - 1, taken + i, + (*nr_taken - i) * sizeof(*taken)); + (*nr_taken)--; + taken[*nr_taken] = NULL; + + local_irq_restore(flags); + + return; + } + } + + if ( !max_depth_reached ) + { + printk("CHECKLOCK released lock at %p not recorded!\n", debug); + WARN(); + } + + local_irq_restore(flags); +} + static void got_lock(union lock_debug *debug) { debug->cpu = smp_processor_id(); + + lock_enter(debug); } static void rel_lock(union lock_debug *debug) { if ( atomic_read(&spin_debug) > 0 ) BUG_ON(debug->cpu != smp_processor_id()); + + lock_exit(debug); + debug->cpu = SPINLOCK_NO_CPU; } diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h index 0cc9167715..b8d52a5aa9 100644 --- a/xen/include/xen/rwlock.h +++ b/xen/include/xen/rwlock.h @@ -66,7 +66,10 @@ static inline int _read_trylock(rwlock_t *lock) * arch_lock_acquire_barrier(). */ if ( likely(_can_read_lock(cnts)) ) + { + lock_enter(&lock->lock.debug); return 1; + } atomic_sub(_QR_BIAS, &lock->cnts); } preempt_enable(); @@ -91,6 +94,7 @@ static inline void _read_lock(rwlock_t *lock) { /* The slow path calls check_lock() via spin_lock(). */ check_lock(&lock->lock.debug, false); + lock_enter(&lock->lock.debug); return; } @@ -123,6 +127,8 @@ static inline unsigned long _read_lock_irqsave(rwlock_t *lock) */ static inline void _read_unlock(rwlock_t *lock) { + lock_exit(&lock->lock.debug); + arch_lock_release_barrier(); /* * Atomically decrement the reader count @@ -170,6 +176,7 @@ static inline void _write_lock(rwlock_t *lock) { /* The slow path calls check_lock() via spin_lock(). */ check_lock(&lock->lock.debug, false); + lock_enter(&lock->lock.debug); return; } @@ -215,6 +222,8 @@ static inline int _write_trylock(rwlock_t *lock) return 0; } + lock_enter(&lock->lock.debug); + /* * atomic_cmpxchg() is a full barrier so no need for an * arch_lock_acquire_barrier(). @@ -225,6 +234,9 @@ static inline int _write_trylock(rwlock_t *lock) static inline void _write_unlock(rwlock_t *lock) { ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); + + lock_exit(&lock->lock.debug); + arch_lock_release_barrier(); atomic_and(~(_QW_CPUMASK | _QW_WMASK), &lock->cnts); preempt_enable(); @@ -343,6 +355,8 @@ static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata, /* All other paths have implicit check_lock() calls via read_lock(). */ check_lock(&percpu_rwlock->rwlock.lock.debug, false); } + + lock_enter(&percpu_rwlock->rwlock.lock.debug); } static inline void _percpu_read_unlock(percpu_rwlock_t **per_cpudata, @@ -353,6 +367,9 @@ static inline void _percpu_read_unlock(percpu_rwlock_t **per_cpudata, /* Verify the read lock was taken for this lock */ ASSERT(this_cpu_ptr(per_cpudata) != NULL); + + lock_exit(&percpu_rwlock->rwlock.lock.debug); + /* * Detect using a second percpu_rwlock_t simulatenously and fallback * to standard read_unlock. @@ -379,6 +396,9 @@ static inline void _percpu_write_unlock(percpu_rwlock_t **per_cpudata, ASSERT(percpu_rwlock->writer_activating); percpu_rwlock->writer_activating = 0; + + lock_exit(&percpu_rwlock->rwlock.lock.debug); + write_unlock(&percpu_rwlock->rwlock); } diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h index 961891bea4..2fa6ba3654 100644 --- a/xen/include/xen/spinlock.h +++ b/xen/include/xen/spinlock.h @@ -22,12 +22,16 @@ union lock_debug { }; #define _LOCK_DEBUG { LOCK_DEBUG_INITVAL } void check_lock(union lock_debug *debug, bool try); +void lock_enter(const union lock_debug *debug); +void lock_exit(const union lock_debug *debug); void spin_debug_enable(void); void spin_debug_disable(void); #else union lock_debug { }; #define _LOCK_DEBUG { } #define check_lock(l, t) ((void)0) +#define lock_enter(l) ((void)0) +#define lock_exit(l) ((void)0) #define spin_debug_enable() ((void)0) #define spin_debug_disable() ((void)0) #endif