diff mbox

[v4,3/6] RDMA/i40iw: Eliminate duplicate barriers on weakly-ordered archs

Message ID 1521514068-8856-4-git-send-email-okaya@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Sinan Kaya March 20, 2018, 2:47 a.m. UTC
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Create a new wrapper function with relaxed write operator. Use the new
wrapper when a write is following a wmb().

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/infiniband/hw/i40iw/i40iw_ctrl.c  |  6 ++++--
 drivers/infiniband/hw/i40iw/i40iw_osdep.h |  1 +
 drivers/infiniband/hw/i40iw/i40iw_uk.c    |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_utils.c | 11 +++++++++++
 4 files changed, 17 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe March 20, 2018, 2:56 p.m. UTC | #1
On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a wmb().
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>  drivers/infiniband/hw/i40iw/i40iw_ctrl.c  |  6 ++++--
>  drivers/infiniband/hw/i40iw/i40iw_osdep.h |  1 +
>  drivers/infiniband/hw/i40iw/i40iw_uk.c    |  2 +-
>  drivers/infiniband/hw/i40iw/i40iw_utils.c | 11 +++++++++++
>  4 files changed, 17 insertions(+), 3 deletions(-)

The one looks fine

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason
Shiraz Saleem March 21, 2018, 1:38 p.m. UTC | #2
On Mon, Mar 19, 2018 at 08:47:45PM -0600, Sinan Kaya wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier on
> some architectures like arm64.
> 
> This ends up CPU observing two barriers back to back before executing the
> register write.
> 
> Create a new wrapper function with relaxed write operator. Use the new
> wrapper when a write is following a wmb().
> 
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>

Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Jason Gunthorpe March 21, 2018, 8:02 p.m. UTC | #3
On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
> diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
> index 8afa5a6..7f0ebed 100644
> +++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
> @@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
>  
>  	wmb(); /* make sure WQE is populated before valid bit is set */
>  
> -	writel(cq->cq_id, cq->cqe_alloc_reg);
> +	writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
>  }

Ah, this one is probably not OK, i40iw_cq_request_notification is
called here:

	spin_lock_irqsave(&iwcq->lock, flags);
	ukcq->ops.iw_cq_request_notification(ukcq, cq_notify);
	spin_unlock_irqrestore(&iwcq->lock, flags);

So this needs to add mmmiomb(); to keep the same semantics.

Generally I think you need to be very careful to ensure that any
conversion to _relaxed isn't contained by a spinlock, or the mmiomb()
is present.

Maybe even do a first series with this obviously correct pattern:

 wmb();
 writel() -> writel_relaxed()
 writel() -> writel_relaxed()
 [..]
 mmiowmb();

Jason
Sinan Kaya March 21, 2018, 9:01 p.m. UTC | #4
On 3/21/2018 3:02 PM, Jason Gunthorpe wrote:
> On Mon, Mar 19, 2018 at 10:47:45PM -0400, Sinan Kaya wrote:
>> diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
>> index 8afa5a6..7f0ebed 100644
>> +++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
>> @@ -723,7 +723,7 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
>>  
>>  	wmb(); /* make sure WQE is populated before valid bit is set */
>>  
>> -	writel(cq->cq_id, cq->cqe_alloc_reg);
>> +	writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
>>  }
> 
> Ah, this one is probably not OK, i40iw_cq_request_notification is
> called here:
> 
> 	spin_lock_irqsave(&iwcq->lock, flags);
> 	ukcq->ops.iw_cq_request_notification(ukcq, cq_notify);
> 	spin_unlock_irqrestore(&iwcq->lock, flags);
> 
> So this needs to add mmmiomb(); to keep the same semantics.
> 
> Generally I think you need to be very careful to ensure that any
> conversion to _relaxed isn't contained by a spinlock, or the mmiomb()
> is present.
> 
> Maybe even do a first series with this obviously correct pattern:
> 
>  wmb();
>  writel() -> writel_relaxed()
>  writel() -> writel_relaxed()
>  [..]
>  mmiowmb();

Good catch. I changed it as follows:

+++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
@@ -723,7 +723,8 @@ static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,

        wmb(); /* make sure WQE is populated before valid bit is set */

-       writel(cq->cq_id, cq->cqe_alloc_reg);
+       writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
+       mmiowb();
 }


> 
> Jason
>
diff mbox

Patch

diff --git a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
index c74fd33..47f473e 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
@@ -706,9 +706,11 @@  static void i40iw_sc_ccq_arm(struct i40iw_sc_cq *ccq)
 	wmb();       /* make sure shadow area is updated before arming */
 
 	if (ccq->dev->is_pf)
-		i40iw_wr32(ccq->dev->hw, I40E_PFPE_CQARM, ccq->cq_uk.cq_id);
+		i40iw_wr32_relaxed(ccq->dev->hw, I40E_PFPE_CQARM,
+				   ccq->cq_uk.cq_id);
 	else
-		i40iw_wr32(ccq->dev->hw, I40E_VFPE_CQARM1, ccq->cq_uk.cq_id);
+		i40iw_wr32_relaxed(ccq->dev->hw, I40E_VFPE_CQARM1,
+				   ccq->cq_uk.cq_id);
 }
 
 /**
diff --git a/drivers/infiniband/hw/i40iw/i40iw_osdep.h b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
index f27be3e..e06f4b9 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_osdep.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
@@ -213,5 +213,6 @@  void i40iw_hw_stats_start_timer(struct i40iw_sc_vsi *vsi);
 void i40iw_hw_stats_stop_timer(struct i40iw_sc_vsi *vsi);
 #define i40iw_mmiowb() mmiowb()
 void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value);
+void i40iw_wr32_relaxed(struct i40iw_hw *hw, u32 reg, u32 value);
 u32  i40iw_rd32(struct i40iw_hw *hw, u32 reg);
 #endif				/* _I40IW_OSDEP_H_ */
diff --git a/drivers/infiniband/hw/i40iw/i40iw_uk.c b/drivers/infiniband/hw/i40iw/i40iw_uk.c
index 8afa5a6..7f0ebed 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_uk.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_uk.c
@@ -723,7 +723,7 @@  static void i40iw_cq_request_notification(struct i40iw_cq_uk *cq,
 
 	wmb(); /* make sure WQE is populated before valid bit is set */
 
-	writel(cq->cq_id, cq->cqe_alloc_reg);
+	writel_relaxed(cq->cq_id, cq->cqe_alloc_reg);
 }
 
 /**
diff --git a/drivers/infiniband/hw/i40iw/i40iw_utils.c b/drivers/infiniband/hw/i40iw/i40iw_utils.c
index ddc1056..99aa6f8 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_utils.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_utils.c
@@ -125,6 +125,17 @@  inline void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value)
 }
 
 /**
+ * i40iw_wr32_relaxed - write 32 bits to hw register without ordering
+ * @hw: hardware information including registers
+ * @reg: register offset
+ * @value: vvalue to write to register
+ */
+inline void i40iw_wr32_relaxed(struct i40iw_hw *hw, u32 reg, u32 value)
+{
+	writel_relaxed(value, hw->hw_addr + reg);
+}
+
+/**
  * i40iw_rd32 - read a 32 bit hw register
  * @hw: hardware information including registers
  * @reg: register offset