From patchwork Fri Aug 31 20:00:40 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dongsu Park X-Patchwork-Id: 1394381 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id C4CAEDFFCF for ; Fri, 31 Aug 2012 20:02:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754989Ab2HaUCE (ORCPT ); Fri, 31 Aug 2012 16:02:04 -0400 Received: from mail-bk0-f46.google.com ([209.85.214.46]:53118 "EHLO mail-bk0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754848Ab2HaUBJ (ORCPT ); Fri, 31 Aug 2012 16:01:09 -0400 Received: by mail-bk0-f46.google.com with SMTP id j10so1457186bkw.19 for ; Fri, 31 Aug 2012 13:01:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references :x-gm-message-state; bh=zCP3Q8X2kJknA6I5nHVbzE9cKjj9LjnIz2cb7pXJNjc=; b=A6Z2IyJyvr+Fudu8NxYuRgvGgmDfQBGgMk2+voek76MtdBhLfqwZMaKVnYOy/gwZ0s +fxr5gHL30MnO7ZFfkhNHMErIYyh1+2xMXc8gV1SE+gGnd4BQHHH0hUBBrHWiP0C+72o 2qhMdXUqa3o8COLx3PV5NF9H+LcpIyGrPpwj52QXBXY5P6ExIc0jjouJ/c0xp5e4v30g jxV4V61l/3e0O3JbKx1k6YvRGjr3ng1kCaJfnNYtzehZNoVt8IBGrt118dLiKPF/j4VS kvArT+qKu9wDNK5eGQRiPD03/enrgtKoh02T9iVoX5MNoSYKw5XvOCERNPf0r+H/BWNX XAeA== Received: by 10.204.152.152 with SMTP id g24mr4634663bkw.104.1346443268511; Fri, 31 Aug 2012 13:01:08 -0700 (PDT) Received: from dneo.profitbricks.localdomain (dslb-188-103-218-031.pools.arcor-ip.net. [188.103.218.31]) by mx.google.com with ESMTPS id 25sm4194314bkx.9.2012.08.31.13.01.06 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 31 Aug 2012 13:01:07 -0700 (PDT) From: dongsu.park@profitbricks.com To: bvanassche@acm.org Cc: dillowda@ornl.gov, roland@kernel.org, sean.hefty@intel.com, hal.rosenstock@gmail.com, JBottomley@parallels.com, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, Dongsu Park Subject: [PATCH 4/5] ib_srp: check if rport->lld_data is NULL before removing rport Date: Fri, 31 Aug 2012 22:00:40 +0200 Message-Id: <1346443241-24844-5-git-send-email-dongsu.park@profitbricks.com> X-Mailer: git-send-email 1.7.11.1 In-Reply-To: <1346443241-24844-1-git-send-email-dongsu.park@profitbricks.com> References: <1346443241-24844-1-git-send-email-dongsu.park@profitbricks.com> X-Gm-Message-State: ALoCoQmUlsozqssiDsFhMAyy4VGuKvGoqz8fDjq4HZ5X1kc1f2n8WvXldw+pGRHmvAwWrdQNb1Hw Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Dongsu Park After removing rport_delete(), rport->lld_data has to be set to NULL. In addition to that, both srp_rport_delete() and rport_dev_loss_timedout() must check if rport->lld_data is NULL, before accessing to rport->lld_data or any rport's target area. Without this patch, the initiator's kernel could crash with the following call trace, especially deleting remote ports as well as IB link down cases. How to reproduce: 1. Configure 500+ vdisks on target, and get initiator connected. 2. Exchange data intensively, which works well. 3. (On initiator) delete SRP remote port occasionally, e.g. # echo "1" > /sys/class/srp_remote_ports/port-6\:1/delete And configure again the SRP target. 4. (On target) disable Infiniband interface, and enable it again. 5. Repeat 3 and 4. Then the initiator's kernel suddenly crashes. (but not always) Kernel Call Trace: BUG: unable to handle kernel paging request at 0000000000010001 IP: [] strnlen+0x5/0x40 PGD 212fea067 PUD 2162f8067 PMD 0 Oops: 0000 [#1] SMP CPU 0 Pid: 2311, comm: kworker/0:2 Not tainted 3.2.8 #1 Supermicro H8DGU/H8DGU RIP: 0010:[] [] strnlen+0x5/0x40 Process kworker/0:2 (pid: 2311, threadinfo ffff880215fe2000, task ffff88020f2ce540) Call Trace: [] ? string+0x4c/0xe0 [] ? vsnprintf+0x1ed/0x5b0 [] ? do_srp_rport_del+0x30/0x30 [scsi_transport_srp] [] ? vscnprintf+0x9/0x20 [] ? vprintk+0xaf/0x440 [] ? next_online_pgdat+0x20/0x50 [] ? next_zone+0x30/0x40 [] ? refresh_cpu_vm_stats+0xf0/0x160 [] ? do_srp_rport_del+0x30/0x30 [scsi_transport_srp] [] ? printk+0x40/0x4a [] ? rport_dev_loss_timedout+0x2d/0xa0 [scsi_transport_srp] [] ? process_one_work+0x113/0x470 [] ? worker_thread+0x163/0x3e0 [] ? manage_workers+0x200/0x200 [] ? manage_workers+0x200/0x200 [] ? kthread+0x96/0xa0 [] ? kernel_thread_helper+0x4/0x10 [] ? kthread_worker_fn+0x180/0x180 [] ? gs_change+0x13/0x13 RIP [] strnlen+0x5/0x40 RSP CR2: 0000000000010001 ---[ end trace d55b61cd78c54a0a ]--- IP: [] kthread_data+0x7/0x10 Oops: 0000 [#2] SMP CPU 3 Pid: 16745, comm: kworker/3:4 Tainted: G D O 3.2.8-pserver+ #51 System manufacturer System Product Name/M4A89GTD-PRO RIP: 0010:[] [] kthread_data+0x7/0x10 Process kworker/3:4 (pid: 16745, threadinfo ffff8801f8162000, task ffff88020ff91440) Call Trace: [] ? wq_worker_sleeping+0x8/0x90 [] ? __schedule+0x432/0x7e0 [] ? do_exit+0x5d4/0x8a0 [] ? printk+0x40/0x4a [] ? oops_end+0xa3/0xf0 [] ? no_context+0xfd/0x270 [] ? check_preempt_wakeup+0x155/0x1d0 [] ? do_page_fault+0x31a/0x440 [] ? select_task_rq_fair+0x432/0x9d0 [] ? cpumask_next_and+0x22/0x40 [] ? find_busiest_group+0x1f3/0xb30 [] ? page_fault+0x25/0x30 [] ? strnlen+0x5/0x40 [] ? string+0x4c/0xe0 [] ? vsnprintf+0x1ed/0x5b0 [] ? do_srp_rport_del+0x30/0x30 [scsi_transport_srp] [] ? vscnprintf+0x9/0x20 [] ? vprintk+0xaf/0x440 [] ? ns_to_timeval+0x9/0x40 [] ? queue_delayed_work_on+0x157/0x170 [] ? do_srp_rport_del+0x30/0x30 [scsi_transport_srp] [] ? printk+0x40/0x4a [] ? rport_dev_loss_timedout+0x2d/0xa0 [scsi_transport_srp] [] ? cpufreq_governor_dbs+0x4b0/0x4b0 [] ? process_one_work+0x113/0x470 [] ? worker_thread+0x163/0x3e0 [] ? manage_workers+0x200/0x200 [] ? manage_workers+0x200/0x200 [] ? kthread+0x96/0xa0 [] ? kernel_thread_helper+0x4/0x10 [] ? kthread_worker_fn+0x180/0x180 [] ? gs_change+0x13/0x13 RIP [] kthread_data+0x7/0x10 RSP CR2: fffffffffffffff8 ---[ end trace cab7f2c38a7f7ba9 ]--- Signed-off-by: Dongsu Park --- drivers/infiniband/ulp/srp/ib_srp.c | 12 +++++++++++- drivers/scsi/scsi_transport_srp.c | 6 ++++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 1b274484..ba7bbfd 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -647,9 +647,19 @@ static void srp_remove_work(struct work_struct *work) static void srp_rport_delete(struct srp_rport *rport) { - struct srp_target_port *target = rport->lld_data; + struct srp_target_port *target; + + if (!rport->lld_data) { + pr_warn("skipping srp_rport_delete. rport->lld_data=%p\n", + rport->lld_data); + return; + } + + target = rport->lld_data; srp_queue_remove_work(target); + + rport->lld_data = NULL; } /** diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index af3cb56..915b355 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -272,6 +272,12 @@ static void rport_dev_loss_timedout(struct work_struct *work) struct Scsi_Host *shost; struct srp_internal *i; + if (!rport->lld_data) { + pr_warn("skipping rport_delete, rport->lld_data=%p\n", + rport->lld_data); + return; + } + pr_err("SRP transport: dev_loss_tmo (%ds) expired - removing %s.\n", rport->dev_loss_tmo, dev_name(&rport->dev));