From patchwork Fri Oct 4 02:24:06 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 2987011 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 135959F288 for ; Fri, 4 Oct 2013 02:29:55 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id ECA5820434 for ; Fri, 4 Oct 2013 02:29:53 +0000 (UTC) Received: from casper.infradead.org (casper.infradead.org [85.118.1.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8848B20207 for ; Fri, 4 Oct 2013 02:29:52 +0000 (UTC) Received: from merlin.infradead.org ([2001:4978:20e::2]) by casper.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1VRv9J-00056h-3q; Fri, 04 Oct 2013 02:29:49 +0000 Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1VRv4R-0007CN-9H; Fri, 04 Oct 2013 02:24:47 +0000 Received: from mail-pd0-f182.google.com ([209.85.192.182]) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1VRv4N-0007BT-Kp for linux-arm-kernel@lists.infradead.org; Fri, 04 Oct 2013 02:24:44 +0000 Received: by mail-pd0-f182.google.com with SMTP id r10so3343967pdi.27 for ; Thu, 03 Oct 2013 19:24:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=X0kAJfktVlmH1WQnL51tCOyP6W1UD1cpMV8Bg/f/UnY=; b=MOKCC9vQ5ThJYzwWeXzSZvJzmq+NDTvhH/qBZJMcTJo+JmzIn4YUXzlr/cYtzDCMe2 JtX6fKE510cdRqBYzUSNwNcKJpYSuvvvne0oAM618w5ZZRXvIx6tlJYga22maHyEXL/V WxohccBO3mu4kKaQm6l4lThTBFm4iRYnV1sFo7LvOPs7eJIwRCULqzlrHY8mU5td79kF NSlDPAlTOFjfP15lJWggNRIgPWC5RlTwhfsyUtfWIu/0vsyInxCvJBs96tZOBntRlha/ tv8q/d1S/AVxS5U/II24XPZHe+p0iZGKS8+tGKPj1D8cdtBrRCPn0dOGi3I427hfc3Wh YRFA== X-Gm-Message-State: ALoCoQn5DCEBjJLalnGK5avktujetW9qfHNIxdFQYZGoDyRQt+/rO1SJHMR6jruUV7/me+XtK4Dg X-Received: by 10.66.187.34 with SMTP id fp2mr12800662pac.12.1380853451064; Thu, 03 Oct 2013 19:24:11 -0700 (PDT) Received: from pg-vmw-gw1.plumgrid.com ([67.21.3.149]) by mx.google.com with ESMTPSA id fa4sm14175142pab.17.1969.12.31.16.00.00 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 03 Oct 2013 19:24:10 -0700 (PDT) From: Alexei Starovoitov To: "David S. Miller" Subject: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq Date: Thu, 3 Oct 2013 19:24:06 -0700 Message-Id: <1380853446-30537-1-git-send-email-ast@plumgrid.com> X-Mailer: git-send-email 1.7.9.5 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20131003_222443_840089_DAA0D1E2 X-CRM114-Status: GOOD ( 20.29 ) X-Spam-Score: 1.5 (+) Cc: Benjamin Herrenschmidt , Heiko Carstens , Eric Dumazet , Paul Mackerras , "H. Peter Anvin" , sparclinux@vger.kernel.org, Nicolas Dichtel , Alexei Starovoitov , linux-s390@vger.kernel.org, Russell King , x86@kernel.org, James Morris , Ingo Molnar , Alexey Kuznetsov , "Paul E. McKenney" , Xi Wang , Matt Evans , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, Stelian Nirlu , Nicolas Schichan , Hideaki YOSHIFUJI , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Mircea Gherzan , Daniel Borkmann , Martin Schwidefsky , linux390@de.ibm.com, linuxppc-dev@lists.ozlabs.org, Patrick McHardy X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,KHOP_BIG_TO_CC, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP on x86 system with net.core.bpf_jit_enable = 1 sudo tcpdump -i eth1 'tcp port 22' causes the warning: [ 56.766097] Possible unsafe locking scenario: [ 56.766097] [ 56.780146] CPU0 [ 56.786807] ---- [ 56.793188] lock(&(&vb->lock)->rlock); [ 56.799593] [ 56.805889] lock(&(&vb->lock)->rlock); [ 56.812266] [ 56.812266] *** DEADLOCK *** [ 56.812266] [ 56.830670] 1 lock held by ksoftirqd/1/13: [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [] vm_unmap_aliases+0x8c/0x380 [ 56.849757] [ 56.849757] stack backtrace: [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45 [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 56.882004] ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007 [ 56.895630] ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001 [ 56.909313] ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001 [ 56.923006] Call Trace: [ 56.929532] [] dump_stack+0x55/0x76 [ 56.936067] [] print_usage_bug+0x1f7/0x208 [ 56.942445] [] ? save_stack_trace+0x2f/0x50 [ 56.948932] [] ? check_usage_backwards+0x150/0x150 [ 56.955470] [] mark_lock+0x282/0x2c0 [ 56.961945] [] __lock_acquire+0x45d/0x1d50 [ 56.968474] [] ? __lock_acquire+0x2de/0x1d50 [ 56.975140] [] ? cpumask_next_and+0x55/0x90 [ 56.981942] [] lock_acquire+0x92/0x1d0 [ 56.988745] [] ? vm_unmap_aliases+0x16a/0x380 [ 56.995619] [] _raw_spin_lock+0x41/0x50 [ 57.002493] [] ? vm_unmap_aliases+0x16a/0x380 [ 57.009447] [] vm_unmap_aliases+0x16a/0x380 [ 57.016477] [] ? vm_unmap_aliases+0x8c/0x380 [ 57.023607] [] change_page_attr_set_clr+0xc0/0x460 [ 57.030818] [] ? trace_hardirqs_on+0xd/0x10 [ 57.037896] [] ? kmem_cache_free+0xb0/0x2b0 [ 57.044789] [] ? free_object_rcu+0x93/0xa0 [ 57.051720] [] set_memory_rw+0x2f/0x40 [ 57.058727] [] bpf_jit_free+0x2c/0x40 [ 57.065577] [] sk_filter_release_rcu+0x1a/0x30 [ 57.072338] [] rcu_process_callbacks+0x202/0x7c0 [ 57.078962] [] __do_softirq+0xf7/0x3f0 [ 57.085373] [] run_ksoftirqd+0x35/0x70 cannot reuse jited filter memory, since it's readonly, so use original bpf insns memory to hold work_struct defer kfree of sk_filter until jit completed freeing tested on x86_64 and i386 Signed-off-by: Alexei Starovoitov Acked-by: Heiko Carstens Acked-by: Ingo Molnar --- arch/arm/net/bpf_jit_32.c | 1 + arch/powerpc/net/bpf_jit_comp.c | 1 + arch/s390/net/bpf_jit_comp.c | 4 +++- arch/sparc/net/bpf_jit_comp.c | 1 + arch/x86/net/bpf_jit_comp.c | 20 +++++++++++++++----- include/linux/filter.h | 11 +++++++++-- net/core/filter.c | 11 +++++++---- 7 files changed, 37 insertions(+), 12 deletions(-) diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index f50d223..99b44e0 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -930,4 +930,5 @@ void bpf_jit_free(struct sk_filter *fp) { if (fp->bpf_func != sk_run_filter) module_free(NULL, fp->bpf_func); + kfree(fp); } diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index bf56e33..2345bdb 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -691,4 +691,5 @@ void bpf_jit_free(struct sk_filter *fp) { if (fp->bpf_func != sk_run_filter) module_free(NULL, fp->bpf_func); + kfree(fp); } diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 7092392..a5df511 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -881,7 +881,9 @@ void bpf_jit_free(struct sk_filter *fp) struct bpf_binary_header *header = (void *)addr; if (fp->bpf_func == sk_run_filter) - return; + goto free_filter; set_memory_rw(addr, header->pages); module_free(NULL, header); +free_filter: + kfree(fp); } diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c index 9c7be59..218b6b2 100644 --- a/arch/sparc/net/bpf_jit_comp.c +++ b/arch/sparc/net/bpf_jit_comp.c @@ -808,4 +808,5 @@ void bpf_jit_free(struct sk_filter *fp) { if (fp->bpf_func != sk_run_filter) module_free(NULL, fp->bpf_func); + kfree(fp); } diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 79c216a..1396a0a 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -772,13 +772,23 @@ out: return; } +static void bpf_jit_free_deferred(struct work_struct *work) +{ + struct sk_filter *fp = container_of((void *)work, struct sk_filter, + insns); + unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; + struct bpf_binary_header *header = (void *)addr; + + set_memory_rw(addr, header->pages); + module_free(NULL, header); + kfree(fp); +} + void bpf_jit_free(struct sk_filter *fp) { if (fp->bpf_func != sk_run_filter) { - unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; - struct bpf_binary_header *header = (void *)addr; - - set_memory_rw(addr, header->pages); - module_free(NULL, header); + struct work_struct *work = (struct work_struct *)fp->insns; + INIT_WORK(work, bpf_jit_free_deferred); + schedule_work(work); } } diff --git a/include/linux/filter.h b/include/linux/filter.h index a6ac848..5d66cd9 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -25,15 +25,20 @@ struct sk_filter { atomic_t refcnt; unsigned int len; /* Number of filter blocks */ + struct rcu_head rcu; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct sock_filter *filter); - struct rcu_head rcu; + /* insns start right after bpf_func, so that sk_run_filter() fetches + * first insn from the same cache line that was used to call into + * sk_run_filter() + */ struct sock_filter insns[0]; }; static inline unsigned int sk_filter_len(const struct sk_filter *fp) { - return fp->len * sizeof(struct sock_filter) + sizeof(*fp); + return max(fp->len * sizeof(struct sock_filter), + sizeof(struct work_struct)) + sizeof(*fp); } extern int sk_filter(struct sock *sk, struct sk_buff *skb); @@ -67,11 +72,13 @@ static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen, } #define SK_RUN_FILTER(FILTER, SKB) (*FILTER->bpf_func)(SKB, FILTER->insns) #else +#include static inline void bpf_jit_compile(struct sk_filter *fp) { } static inline void bpf_jit_free(struct sk_filter *fp) { + kfree(fp); } #define SK_RUN_FILTER(FILTER, SKB) sk_run_filter(SKB, FILTER->insns) #endif diff --git a/net/core/filter.c b/net/core/filter.c index 6438f29..ad5eaba 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -644,7 +644,6 @@ void sk_filter_release_rcu(struct rcu_head *rcu) struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu); bpf_jit_free(fp); - kfree(fp); } EXPORT_SYMBOL(sk_filter_release_rcu); @@ -677,13 +676,15 @@ int sk_unattached_filter_create(struct sk_filter **pfp, { struct sk_filter *fp; unsigned int fsize = sizeof(struct sock_filter) * fprog->len; + unsigned int sk_fsize = max_t(u32, fsize, sizeof(struct work_struct)) + + sizeof(*fp); int err; /* Make sure new filter is there and in the right amounts. */ if (fprog->filter == NULL) return -EINVAL; - fp = kmalloc(fsize + sizeof(*fp), GFP_KERNEL); + fp = kmalloc(sk_fsize, GFP_KERNEL); if (!fp) return -ENOMEM; memcpy(fp->insns, fprog->filter, fsize); @@ -723,6 +724,8 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk) { struct sk_filter *fp, *old_fp; unsigned int fsize = sizeof(struct sock_filter) * fprog->len; + unsigned int sk_fsize = max_t(u32, fsize, sizeof(struct work_struct)) + + sizeof(*fp); int err; if (sock_flag(sk, SOCK_FILTER_LOCKED)) @@ -732,11 +735,11 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk) if (fprog->filter == NULL) return -EINVAL; - fp = sock_kmalloc(sk, fsize+sizeof(*fp), GFP_KERNEL); + fp = sock_kmalloc(sk, sk_fsize, GFP_KERNEL); if (!fp) return -ENOMEM; if (copy_from_user(fp->insns, fprog->filter, fsize)) { - sock_kfree_s(sk, fp, fsize+sizeof(*fp)); + sock_kfree_s(sk, fp, sk_fsize); return -EFAULT; }