From patchwork Tue May 10 20:42:02 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lin X-Patchwork-Id: 9063201 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 457AE9F1C1 for ; Tue, 10 May 2016 20:42:14 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 3102820142 for ; Tue, 10 May 2016 20:42:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 01AC9200EC for ; Tue, 10 May 2016 20:42:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752501AbcEJUmK (ORCPT ); Tue, 10 May 2016 16:42:10 -0400 Received: from mail.kernel.org ([198.145.29.136]:45728 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751466AbcEJUmJ (ORCPT ); Tue, 10 May 2016 16:42:09 -0400 Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 86420200ED; Tue, 10 May 2016 20:42:07 +0000 (UTC) Received: from [105.128.166.126] (unknown [159.203.220.84]) (using TLSv1.2 with cipher DHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A335B200EC; Tue, 10 May 2016 20:42:05 +0000 (UTC) Message-ID: <1462912922.23006.3.camel@ssi> Subject: [RFC PATCH] IB/mlx5: set correct gid_tbl_len for MAD_IFC From: Ming Lin To: linux-rdma@vger.kernel.org Cc: sagi@grimberg.me, Eli Cohen , Or Gerlitz Date: Tue, 10 May 2016 13:42:02 -0700 X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Here is a bug with mlx5_ib. commit d603c809ef91fa2d211bde5e95be417847410379 Author: Eli Cohen Date: Fri Mar 11 22:58:35 2016 +0200 IB/mlx5: Fix decision on using MAD_IFC This commit causes below WARN. The "ix" returns -1 658 void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port, ... 693 /* Coudn't find default GID location */ 694 WARN_ON(ix < 0); 695 WARNING: CPU: 1 PID: 2651 at /home/mlin/linux/drivers/infiniband/core/cache.c:717 ib_cache_gid_set_default_gid+0x2f8/0x340 [ib_core] [ 394.725187] CPU: 1 PID: 2651 Comm: modprobe Tainted: G OE 4.6.0-rc3+ #195 [ 394.734464] Hardware name: Dell Inc. OptiPlex 7010/0YXT71, BIOS A15 08/12/2013 [ 394.743131] 0000000000000000 ffff88006791b848 ffffffff8132996a 0000000000000000 [ 394.752045] 0000000000000000 ffff88006791b888 ffffffff8106a7c7 000002cd00000008 [ 394.761426] 0000000000000000 0000000000000001 ffff880063028780 ffff880060d7c000 [ 394.770370] Call Trace: [ 394.774749] [] dump_stack+0x63/0x89 [ 394.781582] [] __warn+0xc7/0xf0 [ 394.788325] [] warn_slowpath_null+0x18/0x20 [ 394.795732] [] ib_cache_gid_set_default_gid+0x2f8/0x340 [ib_core] [ 394.804556] [] ? pick_next_task_fair+0x367/0x490 [ 394.811923] [] ? __schedule+0x660/0x770 [ 394.818487] [] add_netdev_ips+0xaf/0xc0 [ib_core] [ 394.825935] [] enum_all_gids_of_dev_cb+0x85/0xc0 [ib_core] [ 394.834155] [] ? rdma_protocol_roce_eth_encap+0x20/0x20 [ib_core] [ 394.842993] [] ib_enum_roce_netdev+0xe2/0x100 [ib_core] [ 394.850959] [] ? is_eth_port_of_netdev+0x90/0x90 [ib_core] [ 394.859193] [] roce_rescan_device+0x1c/0x20 [ib_core] [ 394.866981] [] ib_cache_setup_one+0xeb/0x400 [ib_core] [ 394.874851] [] ib_register_device+0x2d9/0x500 [ib_core] [ 394.882807] [] mlx5_ib_add+0xad1/0x1370 [mlx5_ib] [ 394.890211] [] ? ttwu_do_activate.constprop.81+0x58/0x60 [ 394.898212] [] ? __alloc_workqueue_key+0x1f4/0x540 [ 394.905696] [] mlx5_add_device+0x3c/0xa0 [mlx5_core] [ 394.913340] [] ? 0xffffffffc09e3000 [ 394.919516] [] mlx5_register_interface+0x6c/0xa0 [mlx5_core] [ 394.927858] [] mlx5_ib_init+0x35/0x4b [mlx5_ib] [ 394.935059] [] do_one_initcall+0xc8/0x1f0 [ 394.941734] [] ? __vunmap+0x80/0xd0 [ 394.947875] [] do_init_module+0x56/0x1c8 [ 394.954450] [] load_module+0x1dae/0x2670 [ 394.961034] [] ? __symbol_put+0x50/0x50 [ 394.967543] [] SYSC_finit_module+0xa9/0xd0 [ 394.974302] [] SyS_finit_module+0x9/0x10 [ 394.980878] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ 394.988336] ---[ end trace df64015bed03617a ]--- [ 395.007774] BUG: unable to handle kernel paging request at ffffffffffffffe0 [ 395.302076] Call Trace: [ 395.305549] [] ? __warn+0xa0/0xf0 [ 395.311550] [] ib_cache_gid_set_default_gid+0x284/0x340 [ib_core] [ 395.320335] [] ? __schedule+0x660/0x770 [ 395.326868] [] add_netdev_ips+0xaf/0xc0 [ib_core] [ 395.334268] [] enum_all_gids_of_dev_cb+0x85/0xc0 [ib_core] [ 395.342452] [] ? rdma_protocol_roce_eth_encap+0x20/0x20 [ib_core] [ 395.351239] [] ib_enum_roce_netdev+0xe2/0x100 [ib_core] [ 395.359167] [] ? is_eth_port_of_netdev+0x90/0x90 [ib_core] [ 395.367353] [] roce_rescan_device+0x1c/0x20 [ib_core] [ 395.375115] [] ib_cache_setup_one+0xeb/0x400 [ib_core] [ 395.382949] [] ib_register_device+0x2d9/0x500 [ib_core] [ 395.390869] [] mlx5_ib_add+0xad1/0x1370 [mlx5_ib] [ 395.398289] [] ? ttwu_do_activate.constprop.81+0x58/0x60 [ 395.406318] [] ? __alloc_workqueue_key+0x1f4/0x540 [ 395.413806] [] mlx5_add_device+0x3c/0xa0 [mlx5_core] [ 395.421467] [] ? 0xffffffffc09e3000 [ 395.427644] [] mlx5_register_interface+0x6c/0xa0 [mlx5_core] [ 395.436002] [] mlx5_ib_init+0x35/0x4b [mlx5_ib] [ 395.443222] [] do_one_initcall+0xc8/0x1f0 [ 395.449938] [] ? __vunmap+0x80/0xd0 [ 395.456114] [] do_init_module+0x56/0x1c8 [ 395.462722] [] load_module+0x1dae/0x2670 [ 395.469324] [] ? __symbol_put+0x50/0x50 [ 395.475872] [] SYSC_finit_module+0xa9/0xd0 [ 395.482656] [] SyS_finit_module+0x9/0x10 [ 395.489252] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 Instead of reverting the commit, I tried to find out the cause. ib_cache_gid_set_default_gid() calls find_gid() 249 static int find_gid(struct ib_gid_table *table, const union ib_gid *gid, 250 const struct ib_gid_attr *val, bool default_gid, 251 unsigned long mask, int *pempty) 252 { 253 int i = 0; 254 int found = -1; 255 int empty = pempty ? -1 : 0; 256 257 while (i < table->sz && (found < 0 || empty < 0)) { find_gid() returns -1 because table->sz is 0. 757 static int _gid_table_setup_one(struct ib_device *ib_dev) 758 { 759 u8 port; 760 struct ib_gid_table **table; 761 int err = 0; 762 763 table = kcalloc(ib_dev->phys_port_cnt, sizeof(*table), GFP_KERNEL); 764 765 if (!table) { 766 pr_warn("failed to allocate ib gid cache for %s\n", 767 ib_dev->name); 768 return -ENOMEM; 769 } 770 771 for (port = 0; port < ib_dev->phys_port_cnt; port++) { 772 u8 rdma_port = port + rdma_start_port(ib_dev); 773 774 table[port] = 775 alloc_gid_table( 776 ib_dev->port_immutable[rdma_port].gid_tbl_len); "table" is allocated in alloc_gid_table(). And debug shows ib_dev->port_immutable[rdma_port].gid_tbl_len is 0. "gid_tbl_len" is set in mlx5_query_mad_ifc_port() 498 int mlx5_query_mad_ifc_port(struct ib_device *ibdev, u8 port, 499 struct ib_port_attr *props) 500 { ... 537 props->gid_tbl_len = out_mad->data[50]; Debug shows out_mad->data[50] is 0. So here is the "temporary" patch. I just copied it from mlx5_query_hca_port() --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c index 1534af1..ef19b5c 100644 --- a/drivers/infiniband/hw/mlx5/mad.c +++ b/drivers/infiniband/hw/mlx5/mad.c @@ -534,7 +534,7 @@ int mlx5_query_mad_ifc_port(struct ib_device *ibdev, u8 port, props->state = out_mad->data[32] & 0xf; props->phys_state = out_mad->data[33] >> 4; props->port_cap_flags = be32_to_cpup((__be32 *)(out_mad->data + 20)); - props->gid_tbl_len = out_mad->data[50]; + props->gid_tbl_len = mlx5_get_gid_table_len(MLX5_CAP_GEN(mdev, gid_table_size)); props->max_msg_sz = 1 << MLX5_CAP_GEN(mdev, log_max_msg); props->pkey_tbl_len = mdev->port_caps[port - 1].pkey_table_len; props->bad_pkey_cntr = be16_to_cpup((__be16 *)(out_mad->data + 46));