diff mbox

[net-next,05/22] cxgb4: Add T5 write combining support

Message ID 1363088794-31453-6-git-send-email-vipul@chelsio.com (mailing list archive)
State Superseded
Headers show

Commit Message

Vipul Pandya March 12, 2013, 11:46 a.m. UTC
From: Santosh Rastapur <santosh@chelsio.com>

This patch implements a low latency Write Combining (aka Write Coalescing) work
request path. PCIE maps User Space Doorbell BAR2 region writes to the new
interface to SGE. SGE pulls a new message from PCIE new interface and if its a
coalesced write work request then pushes it for processing. This patch copies
coalesced work request to memory mapped BAR2 space.

Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |    2 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   53 +++++++++++++++++++++-
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |   54 +++++++++++++++++++++-
 3 files changed, 104 insertions(+), 5 deletions(-)

Comments

David Miller March 12, 2013, 12:19 p.m. UTC | #1
From: Vipul Pandya <vipul@chelsio.com>
Date: Tue, 12 Mar 2013 17:16:17 +0530

> +				writel(n,  adap->bar2 + q->udb + 8);
> +#if defined(CONFIG_X86_32) || defined(CONFIG_X86_64)
> +			asm volatile("sfence" : : : "memory");
> +#endif

There is absolutely no way I'm letting anyone put crap like this
into a driver.

Use a portable inteface, and if one does not exist create one.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise March 12, 2013, 2:42 p.m. UTC | #2
On 3/12/2013 7:19 AM, David Miller wrote:
> From: Vipul Pandya <vipul@chelsio.com>
> Date: Tue, 12 Mar 2013 17:16:17 +0530
>
>> +				writel(n,  adap->bar2 + q->udb + 8);
>> +#if defined(CONFIG_X86_32) || defined(CONFIG_X86_64)
>> +			asm volatile("sfence" : : : "memory");
>> +#endif
> There is absolutely no way I'm letting anyone put crap like this
> into a driver.
>
> Use a portable inteface, and if one does not exist create one.

I guess you'll have to add a wc_wmb() function for all the hw platforms 
supported by the kernel.  I see libibverbs defines this for the user 
side in include/infiniband/arch.h, and that could be used as the meat of 
the hw platform-specific implementations.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vipul Pandya March 13, 2013, 2:36 p.m. UTC | #3
On 12-03-2013 20:12, Steve Wise wrote:
> On 3/12/2013 7:19 AM, David Miller wrote:
>> From: Vipul Pandya <vipul@chelsio.com>
>> Date: Tue, 12 Mar 2013 17:16:17 +0530
>>
>>> +				writel(n,  adap->bar2 + q->udb + 8);
>>> +#if defined(CONFIG_X86_32) || defined(CONFIG_X86_64)
>>> +			asm volatile("sfence" : : : "memory");
>>> +#endif
>> There is absolutely no way I'm letting anyone put crap like this
>> into a driver.
>>
>> Use a portable inteface, and if one does not exist create one.
> 
> I guess you'll have to add a wc_wmb() function for all the hw platforms 
> supported by the kernel.  I see libibverbs defines this for the user 
> side in include/infiniband/arch.h, and that could be used as the meat of 
> the hw platform-specific implementations.
> 
I see that normal wmb() code for x86_64 architecture is same as what
above #ifdef condition is doing. To be more clear I looked at the
assembly code for wmb and find that it is converted into 'sfence'
instruction. So, I think above code should be replaced with wmb call
which will also take care of portability on different architecture. I
will submit the series again soon.

I would like to request RDMA/cxgb4 and csiostor driver maintainer to
review the series if it has not been done already. It would be great If
I can include review comments from them in my next version of series.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight March 13, 2013, 3:43 p.m. UTC | #4
> >>> +				writel(n,  adap->bar2 + q->udb + 8);
> >>> +#if defined(CONFIG_X86_32) || defined(CONFIG_X86_64)
> >>> +			asm volatile("sfence" : : : "memory");
> >>> +#endif
> >> There is absolutely no way I'm letting anyone put crap like this
> >> into a driver.
> >>
> >> Use a portable inteface, and if one does not exist create one.
> >
> > I guess you'll have to add a wc_wmb() function for all the hw platforms
> > supported by the kernel.  I see libibverbs defines this for the user
> > side in include/infiniband/arch.h, and that could be used as the meat of
> > the hw platform-specific implementations.
> >
> I see that normal wmb() code for x86_64 architecture is same as what
> above #ifdef condition is doing. To be more clear I looked at the
> assembly code for wmb and find that it is converted into 'sfence'
> instruction. So, I think above code should be replaced with wmb call
> which will also take care of portability on different architecture. I
> will submit the series again soon.

From my recollection of the x86 architecture, the memory barriers
are hardly ever needed, certainly not in the places where, for example
a ppc needs them. I'd actually suspect that the normal wmb() for
x86 should be a nop.

About the only place where any on the fence instructions are needed
are in relation to write combining accesses.
In particular I don't believe they are ever needed to synchronise
uncached accesses with each other, or with cached accesses (which
are snooped).

	David




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Casey Leedom March 13, 2013, 10:54 p.m. UTC | #5
On 03/13/13 08:43, David Laight wrote:
> From my recollection of the x86 architecture, the memory barriers are 
> hardly ever needed, certainly not in the places where, for example a 
> ppc needs them. I'd actually suspect that the normal wmb() for x86 
> should be a nop. About the only place where any on the fence 
> instructions are needed are in relation to write combining accesses. 
> In particular I don't believe they are ever needed to synchronise 
> uncached accesses with each other, or with cached accesses (which are 
> snooped). David 

   The question at hand is how should we indicate that we're finished 
with a Write Combined set of stores in an architecturally neutral 
manner?  Is wmb() a good approach?  Vipul has noted that the iWarp code 
uses a new "wc_wmb()" for this purpose which seems to be the same 
_currently_ as wmb() for all current platforms.  I presume that the 
iWarp folks pick a new name to offer themselves some cover in the future.

   And yes, the code sequence that was accidentally included in Vipul's 
previous patches should never have been submitted for inclusion in 
kernel.org.  It got missed in our internal reviews and I apologize for that.

Casey
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index a91dea6..f8ff30e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -439,6 +439,7 @@  struct sge_txq {
 	spinlock_t db_lock;
 	int db_disabled;
 	unsigned short db_pidx;
+	u64 udb;
 };
 
 struct sge_eth_txq {                /* state for an SGE Ethernet Tx queue */
@@ -543,6 +544,7 @@  enum chip_type {
 
 struct adapter {
 	void __iomem *regs;
+	void __iomem *bar2;
 	struct pci_dev *pdev;
 	struct device *pdev_dev;
 	unsigned int mbox;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 3d6d23a..ce1451c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -1327,6 +1327,8 @@  static char stats_strings[][ETH_GSTRING_LEN] = {
 	"VLANinsertions     ",
 	"GROpackets         ",
 	"GROmerged          ",
+	"WriteCoalSuccess   ",
+	"WriteCoalFail      ",
 };
 
 static int get_sset_count(struct net_device *dev, int sset)
@@ -1422,11 +1424,25 @@  static void get_stats(struct net_device *dev, struct ethtool_stats *stats,
 {
 	struct port_info *pi = netdev_priv(dev);
 	struct adapter *adapter = pi->adapter;
+	u32 val1, val2;
 
 	t4_get_port_stats(adapter, pi->tx_chan, (struct port_stats *)data);
 
 	data += sizeof(struct port_stats) / sizeof(u64);
 	collect_sge_port_stats(adapter, pi, (struct queue_port_stats *)data);
+	data += sizeof(struct queue_port_stats) / sizeof(u64);
+	if (!is_t4(adapter->chip)) {
+		t4_write_reg(adapter, SGE_STAT_CFG, STATSOURCE_T5(7));
+		val1 = t4_read_reg(adapter, SGE_STAT_TOTAL);
+		val2 = t4_read_reg(adapter, SGE_STAT_MATCH);
+		*data = val1 - val2;
+		data++;
+		*data = val2;
+		data++;
+	} else {
+		memset(data, 0, 2 * sizeof(u64));
+		*data += 2;
+	}
 }
 
 /*
@@ -5337,10 +5353,11 @@  static void free_some_resources(struct adapter *adapter)
 #define TSO_FLAGS (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
 #define VLAN_FEAT (NETIF_F_SG | NETIF_F_IP_CSUM | TSO_FLAGS | \
 		   NETIF_F_IPV6_CSUM | NETIF_F_HIGHDMA)
+#define SEGMENT_SIZE 128
 
 static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
-	int func, i, err;
+	int func, i, err, s_qpp, qpp, num_seg;
 	struct port_info *pi;
 	bool highdma = false;
 	struct adapter *adapter = NULL;
@@ -5420,7 +5437,34 @@  static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	err = t4_prep_adapter(adapter);
 	if (err)
-		goto out_unmap_bar;
+		goto out_unmap_bar0;
+
+	if (!is_t4(adapter->chip)) {
+		s_qpp = QUEUESPERPAGEPF1 * adapter->fn;
+		qpp = 1 << QUEUESPERPAGEPF0_GET(t4_read_reg(adapter,
+		      SGE_EGRESS_QUEUES_PER_PAGE_PF) >> s_qpp);
+		num_seg = PAGE_SIZE / SEGMENT_SIZE;
+
+		/* Each segment size is 128B. Write coalescing is enabled only
+		 * when SGE_EGRESS_QUEUES_PER_PAGE_PF reg value for the
+		 * queue is less no of segments that can be accommodated in
+		 * a page size.
+		 */
+		if (qpp > num_seg) {
+			dev_err(&pdev->dev,
+				"Incorrect number of egress queues per page\n");
+			err = -EINVAL;
+			goto out_unmap_bar0;
+		}
+		adapter->bar2 = ioremap_wc(pci_resource_start(pdev, 2),
+		pci_resource_len(pdev, 2));
+		if (!adapter->bar2) {
+			dev_err(&pdev->dev, "cannot map device bar2 region\n");
+			err = -ENOMEM;
+			goto out_unmap_bar0;
+		}
+	}
+
 	setup_memwin(adapter);
 	err = adap_init0(adapter);
 	setup_memwin_rdma(adapter);
@@ -5552,6 +5596,9 @@  sriov:
  out_free_dev:
 	free_some_resources(adapter);
  out_unmap_bar:
+	if (!is_t4(adapter->chip))
+		iounmap(adapter->bar2);
+ out_unmap_bar0:
 	iounmap(adapter->regs);
  out_free_adapter:
 	kfree(adapter);
@@ -5602,6 +5649,8 @@  static void remove_one(struct pci_dev *pdev)
 
 		free_some_resources(adapter);
 		iounmap(adapter->regs);
+		if (!is_t4(adapter->chip))
+			iounmap(adapter->bar2);
 		kfree(adapter);
 		pci_disable_pcie_error_reporting(pdev);
 		pci_disable_device(pdev);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 7b17623..75e13d2 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -816,6 +816,22 @@  static void write_sgl(const struct sk_buff *skb, struct sge_txq *q,
 		*end = 0;
 }
 
+/* This function copies 64 byte coalesced work request to
+ * memory mapped BAR2 space(user space writes).
+ * For coalesced WR SGE, fetches data from the FIFO instead of from Host.
+ */
+static void cxgb_pio_copy(u64 __iomem *dst, u64 *src)
+{
+	int count = 8;
+
+	while (count) {
+		writeq(*src, dst);
+		src++;
+		dst++;
+		count--;
+	}
+}
+
 /**
  *	ring_tx_db - check and potentially ring a Tx queue's doorbell
  *	@adap: the adapter
@@ -826,11 +842,27 @@  static void write_sgl(const struct sk_buff *skb, struct sge_txq *q,
  */
 static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
 {
+	unsigned int *wr, index;
+
 	wmb();            /* write descriptors before telling HW */
 	spin_lock(&q->db_lock);
 	if (!q->db_disabled) {
-		t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
-			     QID(q->cntxt_id) | PIDX(n));
+		if (is_t4(adap->chip)) {
+			t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
+				     QID(q->cntxt_id) | PIDX(n));
+		} else {
+			if (n == 1) {
+				index = q->pidx ? (q->pidx - 1) : (q->size - 1);
+				wr = (unsigned int *)&q->desc[index];
+				cxgb_pio_copy((u64 __iomem *)
+					      (adap->bar2 + q->udb + 64),
+					      (u64 *)wr);
+			} else
+				writel(n,  adap->bar2 + q->udb + 8);
+#if defined(CONFIG_X86_32) || defined(CONFIG_X86_64)
+			asm volatile("sfence" : : : "memory");
+#endif
+		}
 	}
 	q->db_pidx = q->pidx;
 	spin_unlock(&q->db_lock);
@@ -2151,11 +2183,27 @@  err:
 
 static void init_txq(struct adapter *adap, struct sge_txq *q, unsigned int id)
 {
+	q->cntxt_id = id;
+	if (!is_t4(adap->chip)) {
+		unsigned int s_qpp;
+		unsigned short udb_density;
+		unsigned long qpshift;
+		int page;
+
+		s_qpp = QUEUESPERPAGEPF1 * adap->fn;
+		udb_density = 1 << QUEUESPERPAGEPF0_GET((t4_read_reg(adap,
+				SGE_EGRESS_QUEUES_PER_PAGE_PF) >> s_qpp));
+		qpshift = PAGE_SHIFT - ilog2(udb_density);
+		q->udb = q->cntxt_id << qpshift;
+		q->udb &= PAGE_MASK;
+		page = q->udb / PAGE_SIZE;
+		q->udb += (q->cntxt_id - (page * udb_density)) * 128;
+	}
+
 	q->in_use = 0;
 	q->cidx = q->pidx = 0;
 	q->stops = q->restarts = 0;
 	q->stat = (void *)&q->desc[q->size];
-	q->cntxt_id = id;
 	spin_lock_init(&q->db_lock);
 	adap->sge.egr_map[id - adap->sge.egr_start] = q;
 }