From patchwork Thu Aug 30 10:38:05 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Or Gerlitz X-Patchwork-Id: 1387181 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 3C0CA3FC33 for ; Thu, 30 Aug 2012 10:38:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751263Ab2H3KiV (ORCPT ); Thu, 30 Aug 2012 06:38:21 -0400 Received: from eu1sys200aog103.obsmtp.com ([207.126.144.115]:58798 "HELO eu1sys200aog103.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1750852Ab2H3KiV (ORCPT ); Thu, 30 Aug 2012 06:38:21 -0400 Received: from mtlsws123.lab.mtl.com ([82.166.227.17]) (using TLSv1) by eu1sys200aob103.postini.com ([207.126.147.11]) with SMTP ID DSNKUD9CmZ/G//yZGZ/fT84/Hf3v1b/E0SHE@postini.com; Thu, 30 Aug 2012 10:38:20 UTC Received: from r-vnc04.lab.mtl.com (r-vnc04.lab.mtl.com [10.208.0.116]) by mtlsws123.lab.mtl.com (8.13.8/8.13.8) with ESMTP id q7UAcFiV030135; Thu, 30 Aug 2012 13:38:15 +0300 From: Or Gerlitz To: roland@kernel.org Cc: linux-rdma@vger.kernel.org, Yishai Hadas , Or Gerlitz Subject: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler Date: Thu, 30 Aug 2012 13:38:05 +0300 Message-Id: <1346323085-2665-1-git-send-email-ogerlitz@mellanox.com> X-Mailer: git-send-email 1.7.8.2 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Yishai Hadas The mlx4 CQ completion handler, mlx4_cq_completion doesn't bother to lock the radix tree which is used to manage the table of CQs, nor to increase the reference count of the CQ before invoking the user provided callback. This is racy and can cause use after free, null pointer dereference, etc various crashes. Fix that by wrapping the radix tree lookup with taking the table lock, and increase the ref count for the time of the callback. Signed-off-by: Yishai Hadas Signed-off-by: Or Gerlitz --- Hi Roland, Doing some code review on the related parts, I didn't see any way the handler can be protected for the tree lockup and the user callback invokation other then using these methods, e.g as done by the "event" callback. Looking on mthca, I see the same behaviour -- mthca_cq_completion takes no locks and doesn't increase the refcount while mthca_cq_event does go by the book and do what ever needed, anything we miss here? its there from day one... Or. drivers/net/ethernet/mellanox/mlx4/cq.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c index 7e64033..05d9529 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c @@ -56,9 +56,14 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn) { struct mlx4_cq *cq; + struct mlx4_cq_table *cq_table = &mlx4_priv(dev)->cq_table; + spin_lock(&cq_table->lock); cq = radix_tree_lookup(&mlx4_priv(dev)->cq_table.tree, cqn & (dev->caps.num_cqs - 1)); + if (cq) + atomic_inc(&cq->refcount); + spin_unlock(&cq_table->lock); if (!cq) { mlx4_dbg(dev, "Completion event for bogus CQ %08x\n", cqn); return; @@ -67,6 +72,8 @@ void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn) ++cq->arm_sn; cq->comp(cq); + if (atomic_dec_and_test(&cq->refcount)) + complete(&cq->free); } void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type)