diff mbox

Mlx4: BUG: unable to handle kernel at ffffffffa02be210

Message ID CAD+HZHVukRdGnfTWHin8O=hUkuOq0SS_aPqxbqjKCmk--OHn=w@mail.gmail.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Jack Wang July 9, 2015, 1:35 p.m. UTC
2015-07-09 13:21 GMT+02:00 Or Gerlitz <ogerlitz@mellanox.com>:
> On 7/9/2015 2:14 PM, Jack Wang wrote:
>>
>> I managed to update the kernel to OFED 3.0 to verify the bug, but I
>> can still produce the bug, maybe there're still some synchronice_irq
>> is missing?
>
>
> Again, even if you don't use the upstream kernel for production, I suggest
> you
> try to reproduce the bug there and if it exists we'll try to solve it on
> upstream
> and later port to MLNX OFED, makes sense?You can start with just the
> installed 3.18.14
>
> Or.
Hello Or,

We have other kernel modules together also the autotest
infrastructure. It's not that easy to install a 3.18.14 kernel.

I look into the code a little bit. I think the bug may relate
radix_tree usage in mlx4_cq_free , OFED code in radix_tree_delete
before synchronize_irq, but mainline code call radix_tree_delete after
synchronize_irq,  does this matter? I'm building a new kernel with
this small change:

  wait_for_completion(&cq->free);
Thanks,
Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Or Gerlitz July 9, 2015, 1:57 p.m. UTC | #1
On 7/9/2015 4:35 PM, Jack Wang wrote:
> We have other kernel modules together also the autotest
> infrastructure. It's not that easy to install a 3.18.14 kernel.

you said you are running on 3.18.14 and just replaced their stock RDMA 
stack with MLNX OFED

>
> I look into the code a little bit. I think the bug may relate
> radix_tree usage in mlx4_cq_free , OFED code in radix_tree_delete
> before synchronize_irq, but mainline code call radix_tree_delete after
> synchronize_irq,  does this matter?

possibly yes, as in life location && timings matter

> I'm building a new kernel with
> this small change:

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -393,16 +393,16 @@  void mlx4_cq_free(struct mlx4_dev *dev, struct
mlx4_cq *cq)
  if (err)
  mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn);

- spin_lock(&cq_table->lock);
- radix_tree_delete(&cq_table->tree, cq->cqn);
- spin_unlock(&cq_table->lock);
-
  synchronize_irq(priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq);
  /* synchronize ASYNC irq */
  if (priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(cq->vector)].irq !=
     priv->eq_table.eq[MLX4_EQ_ASYNC].irq)
  synchronize_irq(priv->eq_table.eq[MLX4_EQ_ASYNC].irq);

+ spin_lock(&cq_table->lock);
+ radix_tree_delete(&cq_table->tree, cq->cqn);
+ spin_unlock(&cq_table->lock);
+
  if (atomic_dec_and_test(&cq->refcount))
  complete(&cq->free);