diff mbox

[v2,for-next,5/7] IB/mlx4: Add IB counters table

Message ID 1444909482-17113-6-git-send-email-eranbe@mellanox.com (mailing list archive)
State Accepted
Headers show

Commit Message

Eran Ben Elisha Oct. 15, 2015, 11:44 a.m. UTC
This is an infrastructure step for allocating and attaching more than
one counter to QPs on the same port. Allocate a counters table and
manage the insertion and removals of the counters in load and unload of
mlx4 IB.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
 drivers/infiniband/hw/mlx4/mad.c     | 25 ++++++++++----
 drivers/infiniband/hw/mlx4/main.c    | 63 ++++++++++++++++++++++++++++--------
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  9 +++++-
 drivers/infiniband/hw/mlx4/qp.c      |  8 +++--
 4 files changed, 81 insertions(+), 24 deletions(-)

Comments

Sagi Grimberg Dec. 24, 2015, 10:12 a.m. UTC | #1
This patch seems to generate a list corruption [1] when I test
with Doug's for-4.5 tree.

Eran, care to take a look at this?

[1]:
mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014) 

mlx4_core: Initializing 0000:04:00.0 

mlx4_core 0000:04:00.0: PCIe link speed is 8.0GT/s, device supports 
8.0GT/s
mlx4_core 0000:04:00.0: PCIe link width is x8, device supports x8 

<mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver 
v2.2-1 (Feb 2014)
<mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0 

<mlx4_ib> mlx4_ib_add: counter index 1 for port 2 allocated 0 

BUG: unable to handle kernel NULL pointer dereference at 
(null)
IP: [<ffffffff81274e36>] __list_add+0x26/0xd0 

PGD 46da14067 PUD 46daa0067 PMD 0 

Oops: 0000 [#1] SMP 

Modules linked in: mlx4_ib(+) ib_sa ib_mad mlx4_core mlx5_ib mlx5_core 
ib_core ib_addr netconsole configfs nfsv3 nfs fscache cfg80211 rfkill 
x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crc32c_intel 
aesni_intel aes_x86_64 glue_helper lrw dm_mod gf128mul ablk_helper 
cryptd iTCO_wdt iTCO_vendor_support sb_edac shpchp ipmi_si ioatdma 
lpc_ich mfd_core edac_core pcspkr wmi ipmi_msghandler i2c_i801 
acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 
mbcache jbd2 sd_mod isci libsas igb serio_raw ahci ptp pps_core libahci 
i2c_algo_bit scsi_transport_sas i2c_core dca ipv6 autofs4 [last 
unloaded: mlx5_core] 

CPU: 0 PID: 1737 Comm: modprobe Not tainted 4.4.0-rc6+ #107 
 

Hardware name: Supermicro SYS-1027R-WRF/X9DRW, BIOS 3.0a 08/08/2013 
 

task: ffff8804673da800 ti: ffff880466694000 task.ti: ffff880466694000 
 

RIP: 0010:[<ffffffff81274e36>]  [<ffffffff81274e36>] 
__list_add+0x26/0xd0 

RSP: 0018:ffff880466697898  EFLAGS: 00010246 
 

RAX: 00000000ffffffff RBX: ffff8804666978c8 RCX: ffff8804673da800 
 

RDX: ffff88086b8539b8 RSI: 0000000000000000 RDI: ffff8804666978c8 
 

RBP: ffff8804666978b8 R08: 0000000000000000 R09: 0000000000000001 
 

R10: 0000000000000000 R11: 000000000000fffe R12: ffff88086b8539b8 
 

R13: 0000000000000000 R14: ffff88086b8539b8 R15: ffff880466697908
FS:  00007f37a02cf700(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000046b6ee000 CR4: 00000000000406f0
Stack:
  ffff8804673da800 ffff88086b8539b0 ffff8804673da800 ffff88086b8539b4
  ffff880466697958 ffffffff8154f7be ffff880466697904 0000000000000292
  ffff880466697938 ffffffff81259bc1 0000000000007f49 80000000024000c0
Call Trace:
  [<ffffffff8154f7be>] __mutex_lock_slowpath+0x6e/0x110
  [<ffffffff81259bc1>] ? ida_simple_get+0x91/0x100
  [<ffffffff811d354e>] ? kernfs_next_descendant_post+0x1e/0x90
  [<ffffffff811d3646>] ? kernfs_activate+0x86/0xf0
  [<ffffffff8154f87e>] mutex_lock+0x1e/0x40
  [<ffffffffa00fb083>] iboe_process_mad+0x73/0x180 [mlx4_ib]
  [<ffffffffa00fba36>] mlx4_ib_process_mad+0xd6/0x110 [mlx4_ib]
  [<ffffffffa06b4703>] get_perf_mad+0x103/0x140 [ib_core]
  [<ffffffffa06b4764>] get_counter_table+0x24/0x40 [ib_core]
  [<ffffffff8115846e>] ? __kmalloc+0xde/0xe0
  [<ffffffffa06b4895>] add_port+0x115/0x3f0 [ib_core]
  [<ffffffffa06b4c5e>] ib_device_register_sysfs+0xee/0x160 [ib_core]
  [<ffffffffa06b5e05>] ib_register_device+0x1d5/0x300 [ib_core]
  [<ffffffffa010282b>] mlx4_ib_add+0x78b/0xd00 [mlx4_ib]
  [<ffffffffa08027ce>] mlx4_add_device+0x3e/0xb0 [mlx4_core]
  [<ffffffffa0802957>] mlx4_register_interface+0x87/0xe0 [mlx4_core]
  [<ffffffffa0096055>] mlx4_ib_init+0x55/0x72 [mlx4_ib]
  [<ffffffffa0096000>] ? 0xffffffffa0096000
  [<ffffffff81000368>] do_one_initcall+0xa8/0x1c0
  [<ffffffff810ca5bf>] do_init_module+0x5f/0x210
  [<ffffffff810cc267>] load_module+0x5d7/0x700
  [<ffffffff810c97a0>] ? mod_sysfs_teardown+0x140/0x140
  [<ffffffff810c91f0>] ? module_sect_show+0x20/0x20
  [<ffffffff810cc44b>] SyS_finit_module+0xbb/0xf0
  [<ffffffff81551757>] entry_SYSCALL_64_fastpath+0x12/0x6a
Code: 90 90 90 90 90 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 
89 fb 4c 89 6d f8 4c 8b 42 08 49 89 f5 49 89 d4 49 39 f0 75 31 <4d> 8b 
45 00 4d 39 c4 75 6f 4c 39 e3 74 45 4c 39 eb 74 40 49 89
RIP  [<ffffffff81274e36>] __list_add+0x26/0xd0
  RSP <ffff880466697898>
CR2: 0000000000000000
---[ end trace 5f4fe0ca857661e6 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Dec. 24, 2015, 10:30 a.m. UTC | #2
Doug,

I'm also can't load mlx5 drivers in your tree [1] but
I don't know where it's from, it can come from pretty much everything...

Now I'm left with no useable HW to test with :(


[1]:
mlx5_core 0000:06:00.0: firmware version: 12.14.74
mlx5_core 0000:06:00.1: firmware version: 12.14.74
mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014)
command failed, status bad parameter(0x3), syndrome 0x7424da
command failed, status bad parameter(0x3), syndrome 0x7424da
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 24, 2015, 10:34 a.m. UTC | #3
On 12/24/2015 12:12 PM, Sagi Grimberg wrote:
> This patch seems to generate a list corruption [1] when I test
> with Doug's for-4.5 tree. Eran, care to take a look at this? 

This patch is part from a series that was introduced in 4.3-rc1 [1], did 
4.4-rc5/6 worked for you before you uploaded there further patches?

Or.

[1]
fbfb662 IB/mlx4: Add support for blocking multicast loopback QP creation 
user flag
7b59f0f IB/mlx4: Add counter based implementation for QP multicast 
loopback block
3ba8e31 IB/mlx4: Add IB counters table
74194fb net/mlx4_en: Implement mcast loopback prevention for ETH qps
9a89283 net/mlx4_core: Add support for filtering multicast loopback
ddf9529 IB/core: Allow setting create flags in QP init attribute
6d8a749 IB/core: Extend ib_uverbs_create_qp


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Dec. 24, 2015, 10:42 a.m. UTC | #4
>> This patch seems to generate a list corruption [1] when I test
>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>
> This patch is part from a series that was introduced in 4.3-rc1 [1],

Then something else broke it. Can people check their patches on doug's
tree? At the moment it's unusable...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 24, 2015, 12:38 p.m. UTC | #5
On 12/24/2015 12:42 PM, Sagi Grimberg wrote:
>
>>> This patch seems to generate a list corruption [1] when I test
>>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>>
>> This patch is part from a series that was introduced in 4.3-rc1 [1],
>
> Then something else broke it. Can people check their patches on doug's
> tree? At the moment it's unusable...

Yes, I checked the branch up to commit 882f3b3 "Merge branches 
'4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works 
(rping, ibv_rc_pingpong over top of mlx4 VPI)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matan Barak Dec. 24, 2015, 2:07 p.m. UTC | #6
On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> On 12/24/2015 12:42 PM, Sagi Grimberg wrote:
>>
>>
>>>> This patch seems to generate a list corruption [1] when I test
>>>> with Doug's for-4.5 tree. Eran, care to take a look at this?
>>>
>>>
>>> This patch is part from a series that was introduced in 4.3-rc1 [1],
>>
>>
>> Then something else broke it. Can people check their patches on doug's
>> tree? At the moment it's unusable...
>

Leon and I have checked Doug's tree with mlx4_ib disabled and we
didn't encounter any error.
We ran ucmatose over IB connection (in mlx5) and it worked flawlessly.

>
> Yes, I checked the branch up to commit 882f3b3 "Merge branches
> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping,
> ibv_rc_pingpong over top of mlx4 VPI)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 1cd75ff..68f2567 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -824,18 +824,29 @@  static int iboe_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 {
 	struct mlx4_counter counter_stats;
 	struct mlx4_ib_dev *dev = to_mdev(ibdev);
-	int err;
+	struct counter_index *tmp_counter;
+	int err = IB_MAD_RESULT_FAILURE, stats_avail = 0;
 
 	if (in_mad->mad_hdr.mgmt_class != IB_MGMT_CLASS_PERF_MGMT)
 		return -EINVAL;
 
 	memset(&counter_stats, 0, sizeof(counter_stats));
-	err = mlx4_get_counter_stats(dev->dev,
-				     dev->counters[port_num - 1].index,
-				     &counter_stats, 0);
-	if (err)
-		err = IB_MAD_RESULT_FAILURE;
-	else {
+	mutex_lock(&dev->counters_table[port_num - 1].mutex);
+	list_for_each_entry(tmp_counter,
+			    &dev->counters_table[port_num - 1].counters_list,
+			    list) {
+		err = mlx4_get_counter_stats(dev->dev,
+					     tmp_counter->index,
+					     &counter_stats, 0);
+		if (err) {
+			err = IB_MAD_RESULT_FAILURE;
+			stats_avail = 0;
+			break;
+		}
+		stats_avail = 1;
+	}
+	mutex_unlock(&dev->counters_table[port_num - 1].mutex);
+	if (stats_avail) {
 		memset(out_mad->data, 0, sizeof out_mad->data);
 		switch (counter_stats.counter_mode & 0xf) {
 		case 0:
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 38be8dc..232b104 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1249,6 +1249,22 @@  static int add_gid_entry(struct ib_qp *ibqp, union ib_gid *gid)
 	return 0;
 }
 
+static void mlx4_ib_delete_counters_table(struct mlx4_ib_dev *ibdev,
+					  struct mlx4_ib_counters *ctr_table)
+{
+	struct counter_index *counter, *tmp_count;
+
+	mutex_lock(&ctr_table->mutex);
+	list_for_each_entry_safe(counter, tmp_count, &ctr_table->counters_list,
+				 list) {
+		if (counter->allocated)
+			mlx4_counter_free(ibdev->dev, counter->index);
+		list_del(&counter->list);
+		kfree(counter);
+	}
+	mutex_unlock(&ctr_table->mutex);
+}
+
 int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
 		   union ib_gid *gid)
 {
@@ -2133,6 +2149,7 @@  static void *mlx4_ib_add(struct mlx4_dev *dev)
 	int num_req_counters;
 	int allocated;
 	u32 counter_index;
+	struct counter_index *new_counter_index = NULL;
 
 	pr_info_once("%s", mlx4_ib_version);
 
@@ -2304,6 +2321,11 @@  static void *mlx4_ib_add(struct mlx4_dev *dev)
 	if (init_node_data(ibdev))
 		goto err_map;
 
+	for (i = 0; i < ibdev->num_ports; ++i) {
+		mutex_init(&ibdev->counters_table[i].mutex);
+		INIT_LIST_HEAD(&ibdev->counters_table[i].counters_list);
+	}
+
 	num_req_counters = mlx4_is_bonded(dev) ? 1 : ibdev->num_ports;
 	for (i = 0; i < num_req_counters; ++i) {
 		mutex_init(&ibdev->qp1_proxy_lock[i]);
@@ -2322,15 +2344,34 @@  static void *mlx4_ib_add(struct mlx4_dev *dev)
 			counter_index = mlx4_get_default_counter_index(dev,
 								       i + 1);
 		}
-		ibdev->counters[i].index = counter_index;
-		ibdev->counters[i].allocated = allocated;
+		new_counter_index = kmalloc(sizeof(*new_counter_index),
+					    GFP_KERNEL);
+		if (!new_counter_index) {
+			if (allocated)
+				mlx4_counter_free(ibdev->dev, counter_index);
+			goto err_counter;
+		}
+		new_counter_index->index = counter_index;
+		new_counter_index->allocated = allocated;
+		list_add_tail(&new_counter_index->list,
+			      &ibdev->counters_table[i].counters_list);
+		ibdev->counters_table[i].default_counter = counter_index;
 		pr_info("counter index %d for port %d allocated %d\n",
 			counter_index, i + 1, allocated);
 	}
 	if (mlx4_is_bonded(dev))
 		for (i = 1; i < ibdev->num_ports ; ++i) {
-			ibdev->counters[i].index = ibdev->counters[0].index;
-			ibdev->counters[i].allocated = 0;
+			new_counter_index =
+					kmalloc(sizeof(struct counter_index),
+						GFP_KERNEL);
+			if (!new_counter_index)
+				goto err_counter;
+			new_counter_index->index = counter_index;
+			new_counter_index->allocated = 0;
+			list_add_tail(&new_counter_index->list,
+				      &ibdev->counters_table[i].counters_list);
+			ibdev->counters_table[i].default_counter =
+								counter_index;
 		}
 
 	mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
@@ -2439,12 +2480,9 @@  err_steer_qp_release:
 		mlx4_qp_release_range(dev, ibdev->steer_qpn_base,
 				      ibdev->steer_qpn_count);
 err_counter:
-	for (i = 0; i < ibdev->num_ports; ++i) {
-		if (ibdev->counters[i].index != -1 &&
-		    ibdev->counters[i].allocated)
-			mlx4_counter_free(ibdev->dev,
-					  ibdev->counters[i].index);
-	}
+	for (i = 0; i < ibdev->num_ports; ++i)
+		mlx4_ib_delete_counters_table(ibdev, &ibdev->counters_table[i]);
+
 err_map:
 	iounmap(ibdev->uar_map);
 
@@ -2548,9 +2586,8 @@  static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
 
 	iounmap(ibdev->uar_map);
 	for (p = 0; p < ibdev->num_ports; ++p)
-		if (ibdev->counters[p].index != -1 &&
-		    ibdev->counters[p].allocated)
-			mlx4_counter_free(ibdev->dev, ibdev->counters[p].index);
+		mlx4_ib_delete_counters_table(ibdev, &ibdev->counters_table[p]);
+
 	mlx4_foreach_port(p, dev, MLX4_PORT_TYPE_IB)
 		mlx4_CLOSE_PORT(dev, p);
 
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 1e7b23b..4056dc1 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -528,10 +528,17 @@  struct mlx4_ib_iov_port {
 };
 
 struct counter_index {
+	struct  list_head       list;
 	u32		index;
 	u8		allocated;
 };
 
+struct mlx4_ib_counters {
+	struct list_head        counters_list;
+	struct mutex            mutex; /* mutex for accessing counters list */
+	u32			default_counter;
+};
+
 struct mlx4_ib_dev {
 	struct ib_device	ib_dev;
 	struct mlx4_dev	       *dev;
@@ -550,7 +557,7 @@  struct mlx4_ib_dev {
 	struct mutex		cap_mask_mutex;
 	bool			ib_active;
 	struct mlx4_ib_iboe	iboe;
-	struct counter_index    counters[MLX4_MAX_PORTS];
+	struct mlx4_ib_counters counters_table[MLX4_MAX_PORTS];
 	int		       *eq_table;
 	struct kobject	       *iov_parent;
 	struct kobject	       *ports_parent;
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 4ad9be3..382913e 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1460,6 +1460,7 @@  static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 	int sqd_event;
 	int steer_qp = 0;
 	int err = -EINVAL;
+	int counter_index;
 
 	/* APM is not supported under RoCE */
 	if (attr_mask & IB_QP_ALT_PATH &&
@@ -1543,9 +1544,10 @@  static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 	}
 
 	if (cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR) {
-		if (dev->counters[qp->port - 1].index != -1) {
-			context->pri_path.counter_index =
-					dev->counters[qp->port - 1].index;
+		counter_index =
+			dev->counters_table[qp->port - 1].default_counter;
+		if (counter_index != -1) {
+			context->pri_path.counter_index = counter_index;
 			optpar |= MLX4_QP_OPTPAR_COUNTER_INDEX;
 		} else
 			context->pri_path.counter_index =