diff mbox

IB/qib: defer hca error events to tasklet

Message ID 20110721132116.25683.74826.stgit@kop-dev-sles11-04.qlogic.org (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Mike Marciniszyn July 21, 2011, 1:21 p.m. UTC
With ib_qib options as follows:

options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4

A run of ib_write_bw -a yields the following:

------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
 1048576   5000           2910.64            229.80
------------------------------------------------------------------

The top cpu use in a profile is:

CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
of 0x00 (No unit mask) count 1002300
Counted LLC_MISSES events (Last level cache demand requests from this core that
missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples  %        samples  %        app name                 symbol name
15237    29.2642  964      17.1195  ib_qib.ko                qib_7322intr
12320    23.6618  1040     18.4692  ib_qib.ko                handle_7322_errors
4106      7.8860  0              0  vmlinux                  vsnprintf


Analysis of the stats, profile, the code, and the annotated profile indicate:
- All of the overflow interrupts (one per packet overflow) are serviced
  on CPU0 with no mitigation on the frequency.
- All of the receive interrupts are being serviced by CPU0.  (That is the way
  truescale.cmds statically allocates the kctx IRQs to CPU.)
- The code is spending all of its time servicing QIB_I_C_ERROR RcvEgrFullErr
  interrupts on CPU0, starving the packet receive processing
- The decode_err routine is very inefficient, using a printf variant to format
  a "%s" and continues to loop when the errs mask has been cleared
- Both qib_7322intr and handle_7322_errors read pci registers, which is very
  inefficient.

The fix does the following:
- Adds a tasklet to service the QIB_I_C_ERROR
- replaces the very inefficient scnprintf() with a memcpy().  A field is added
  to qib_hwerror_msgs to save the sizeof("string") at compile time so that a
  strlen is not needed during err_decode().
- The most frequent errors (Overflows) are serviced first to exit the loop as
  early as possible
- The loop now exits as soon as the errs mask is clear rather than fruitlessly
  looping through the msp array

With this fix the performance changes to:

------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
 1048576   5000           2990.64            2941.35
------------------------------------------------------------------

During testing of the error handling overflow patch, it was determined
that some CPU's were slower when servicing both overflow
and receive interrupts on CPU0 with different MSI interrupt vectors.

This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
interrupt for kctx's < 2 and to service them on the default
interrupt.  For some CPUs, the cost of the interrupt enter/exit is
more costly than then the additional PCI read in the default handler.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
---
 drivers/infiniband/hw/qib/qib.h         |    3 +
 drivers/infiniband/hw/qib/qib_iba7322.c |   71 ++++++++++++++++++++++---------
 2 files changed, 53 insertions(+), 21 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Roland Dreier July 21, 2011, 3:46 p.m. UTC | #1
> +static ushort qib_krcvq01_no_msi;
> +module_param_named(krcvq01_no_msi, qib_krcvq01_no_msi, ushort, S_IRUGO);
> +MODULE_PARM_DESC(krcvq01_no_msi, "No MSI for kctx < 2");

First, the obvious question: is there really no better way to handle this
than to have yet another cryptically named module parameter for users
to try both ways?

Second, why can't this just be module_param() (why is _named needed?)?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Marciniszyn July 21, 2011, 4:30 p.m. UTC | #2
There is no choice but a module parameter since the msix initialization is done at driver load, and is only needed after testing reveals that the change is necessary.

I agree that there are a lot of module parameters.   Some that don't allocate resources at load time could be "sys".

I can re-issue the patch with module_param and change the identifier.

The module parameters used throughout the code usually prepend a qib_ or and ib_qib_, and I was following precedent in the code.

The only "no prepend" I see is the sdma_descq_cnt, which could change as well to use module_param.

Mike

> -----Original Message-----
> From: roland@purestorage.com [mailto:roland@purestorage.com] On Behalf
> Of Roland Dreier
> Sent: Thursday, July 21, 2011 11:47 AM
> To: Mike Marciniszyn
> Cc: linux-rdma@vger.kernel.org
> Subject: Re: [PATCH] IB/qib: defer hca error events to tasklet
>
> > +static ushort qib_krcvq01_no_msi;
> > +module_param_named(krcvq01_no_msi, qib_krcvq01_no_msi, ushort,
> S_IRUGO);
> > +MODULE_PARM_DESC(krcvq01_no_msi, "No MSI for kctx < 2");
>
> First, the obvious question: is there really no better way to handle
> this
> than to have yet another cryptically named module parameter for users
> to try both ways?
>
> Second, why can't this just be module_param() (why is _named needed?)?


This message and any attached documents contain information from QLogic Corporation or its wholly-owned subsidiaries that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roland Dreier July 22, 2011, 7:01 p.m. UTC | #3
On Thu, Jul 21, 2011 at 9:30 AM, Mike Marciniszyn
<mike.marciniszyn@qlogic.com> wrote:
> There is no choice but a module parameter since the msix initialization is done at driver load, and is only needed after testing reveals that the change is necessary.
>
> I agree that there are a lot of module parameters.   Some that don't allocate resources at load time could be "sys".
>
> I can re-issue the patch with module_param and change the identifier.
>
> The module parameters used throughout the code usually prepend a qib_ or and ib_qib_, and I was following precedent in the code.
>
> The only "no prepend" I see is the sdma_descq_cnt, which could change as well to use module_param.

I looked over the code and I think it makes sense to match the
existing practice with module_param.
Longer term cleaning that up might be worthwhile, but even better
would be to get
rid of the need for so many knobs for users to adjust.

In any case I went ahead and applied this, since I don't think it's
worth quibbling over.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
index 769a1d9..c9624ea 100644
--- a/drivers/infiniband/hw/qib/qib.h
+++ b/drivers/infiniband/hw/qib/qib.h
@@ -1012,6 +1012,8 @@  struct qib_devdata {
 	u8 psxmitwait_supported;
 	/* cycle length of PS* counters in HW (in picoseconds) */
 	u16 psxmitwait_check_rate;
+	/* high volume overflow errors defered to tasklet */
+	struct tasklet_struct error_tasklet;
 };
 
 /* hol_state values */
@@ -1433,6 +1435,7 @@  extern struct mutex qib_mutex;
 struct qib_hwerror_msgs {
 	u64 mask;
 	const char *msg;
+	size_t sz;
 };
 
 #define QLOGIC_IB_HWE_MSG(a, b) { .mask = a, .msg = b }
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
index 821226c..5ea9ece 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -114,6 +114,10 @@  static ushort qib_singleport;
 module_param_named(singleport, qib_singleport, ushort, S_IRUGO);
 MODULE_PARM_DESC(singleport, "Use only IB port 1; more per-port buffer space");
 
+static ushort qib_krcvq01_no_msi;
+module_param_named(krcvq01_no_msi, qib_krcvq01_no_msi, ushort, S_IRUGO);
+MODULE_PARM_DESC(krcvq01_no_msi, "No MSI for kctx < 2");
+
 /*
  * Receive header queue sizes
  */
@@ -1106,9 +1110,9 @@  static inline u32 read_7322_creg32_port(const struct qib_pportdata *ppd,
 #define AUTONEG_TRIES 3 /* sequential retries to negotiate DDR */
 
 #define HWE_AUTO(fldname) { .mask = SYM_MASK(HwErrMask, fldname##Mask), \
-	.msg = #fldname }
+	.msg = #fldname , .sz = sizeof(#fldname) }
 #define HWE_AUTO_P(fldname, port) { .mask = SYM_MASK(HwErrMask, \
-	fldname##Mask##_##port), .msg = #fldname }
+	fldname##Mask##_##port), .msg = #fldname , .sz = sizeof(#fldname) }
 static const struct qib_hwerror_msgs qib_7322_hwerror_msgs[] = {
 	HWE_AUTO_P(IBSerdesPClkNotDetect, 1),
 	HWE_AUTO_P(IBSerdesPClkNotDetect, 0),
@@ -1126,14 +1130,16 @@  static const struct qib_hwerror_msgs qib_7322_hwerror_msgs[] = {
 	HWE_AUTO_P(IBCBusFromSPCParityErr, 0),
 	HWE_AUTO(statusValidNoEop),
 	HWE_AUTO(LATriggered),
-	{ .mask = 0 }
+	{ .mask = 0, .sz = 0 }
 };
 
 #define E_AUTO(fldname) { .mask = SYM_MASK(ErrMask, fldname##Mask), \
-	.msg = #fldname }
+	.msg = #fldname, .sz = sizeof(#fldname) }
 #define E_P_AUTO(fldname) { .mask = SYM_MASK(ErrMask_0, fldname##Mask), \
-	.msg = #fldname }
+	.msg = #fldname, .sz = sizeof(#fldname) }
 static const struct qib_hwerror_msgs qib_7322error_msgs[] = {
+	E_AUTO(RcvEgrFullErr),
+	E_AUTO(RcvHdrFullErr),
 	E_AUTO(ResetNegated),
 	E_AUTO(HardwareErr),
 	E_AUTO(InvalidAddrErr),
@@ -1146,9 +1152,7 @@  static const struct qib_hwerror_msgs qib_7322error_msgs[] = {
 	E_AUTO(SendSpecialTriggerErr),
 	E_AUTO(SDmaWrongPortErr),
 	E_AUTO(SDmaBufMaskDuplicateErr),
-	E_AUTO(RcvHdrFullErr),
-	E_AUTO(RcvEgrFullErr),
-	{ .mask = 0 }
+	{ .mask = 0, .sz = 0 }
 };
 
 static const struct  qib_hwerror_msgs qib_7322p_error_msgs[] = {
@@ -1158,7 +1162,8 @@  static const struct  qib_hwerror_msgs qib_7322p_error_msgs[] = {
 	/*
 	 * SDmaHaltErr is not really an error, make it clearer;
 	 */
-	{.mask = SYM_MASK(ErrMask_0, SDmaHaltErrMask), .msg = "SDmaHalted"},
+	{.mask = SYM_MASK(ErrMask_0, SDmaHaltErrMask), .msg = "SDmaHalted",
+		.sz = 11},
 	E_P_AUTO(SDmaDescAddrMisalignErr),
 	E_P_AUTO(SDmaUnexpDataErr),
 	E_P_AUTO(SDmaMissingDwErr),
@@ -1194,7 +1199,7 @@  static const struct  qib_hwerror_msgs qib_7322p_error_msgs[] = {
 	E_P_AUTO(RcvICRCErr),
 	E_P_AUTO(RcvVCRCErr),
 	E_P_AUTO(RcvFormatErr),
-	{ .mask = 0 }
+	{ .mask = 0, .sz = 0 }
 };
 
 /*
@@ -1202,17 +1207,17 @@  static const struct  qib_hwerror_msgs qib_7322p_error_msgs[] = {
  * context
  */
 #define INTR_AUTO(fldname) { .mask = SYM_MASK(IntMask, fldname##Mask), \
-	.msg = #fldname }
+	.msg = #fldname, .sz = sizeof(#fldname) }
 /* Below generates "auto-message" for interrupts specific to a port */
 #define INTR_AUTO_P(fldname) { .mask = MASK_ACROSS(\
 	SYM_LSB(IntMask, fldname##Mask##_0), \
 	SYM_LSB(IntMask, fldname##Mask##_1)), \
-	.msg = #fldname "_P" }
+	.msg = #fldname "_P", .sz = sizeof(#fldname "_P") }
 /* For some reason, the SerDesTrimDone bits are reversed */
 #define INTR_AUTO_PI(fldname) { .mask = MASK_ACROSS(\
 	SYM_LSB(IntMask, fldname##Mask##_1), \
 	SYM_LSB(IntMask, fldname##Mask##_0)), \
-	.msg = #fldname "_P" }
+	.msg = #fldname "_P", .sz = sizeof(#fldname "_P") }
 /*
  * Below generates "auto-message" for interrupts specific to a context,
  * with ctxt-number appended
@@ -1220,7 +1225,7 @@  static const struct  qib_hwerror_msgs qib_7322p_error_msgs[] = {
 #define INTR_AUTO_C(fldname) { .mask = MASK_ACROSS(\
 	SYM_LSB(IntMask, fldname##0IntMask), \
 	SYM_LSB(IntMask, fldname##17IntMask)), \
-	.msg = #fldname "_C"}
+	.msg = #fldname "_C", .sz = sizeof(#fldname "_C") }
 
 static const struct  qib_hwerror_msgs qib_7322_intr_msgs[] = {
 	INTR_AUTO_P(SDmaInt),
@@ -1234,11 +1239,12 @@  static const struct  qib_hwerror_msgs qib_7322_intr_msgs[] = {
 	INTR_AUTO_P(SendDoneInt),
 	INTR_AUTO(SendBufAvailInt),
 	INTR_AUTO_C(RcvAvail),
-	{ .mask = 0 }
+	{ .mask = 0, .sz = 0 }
 };
 
 #define TXSYMPTOM_AUTO_P(fldname) \
-	{ .mask = SYM_MASK(SendHdrErrSymptom_0, fldname), .msg = #fldname }
+	{ .mask = SYM_MASK(SendHdrErrSymptom_0, fldname), \
+	.msg = #fldname, .sz = sizeof(#fldname) }
 static const struct  qib_hwerror_msgs hdrchk_msgs[] = {
 	TXSYMPTOM_AUTO_P(NonKeyPacket),
 	TXSYMPTOM_AUTO_P(GRHFail),
@@ -1247,7 +1253,7 @@  static const struct  qib_hwerror_msgs hdrchk_msgs[] = {
 	TXSYMPTOM_AUTO_P(SLIDFail),
 	TXSYMPTOM_AUTO_P(RawIPV6),
 	TXSYMPTOM_AUTO_P(PacketTooSmall),
-	{ .mask = 0 }
+	{ .mask = 0, .sz = 0 }
 };
 
 #define IBA7322_HDRHEAD_PKTINT_SHIFT 32 /* interrupt cnt in upper 32 bits */
@@ -1292,7 +1298,7 @@  static void err_decode(char *msg, size_t len, u64 errs,
 	u64 these, lmask;
 	int took, multi, n = 0;
 
-	while (msp && msp->mask) {
+	while (errs && msp && msp->mask) {
 		multi = (msp->mask & (msp->mask - 1));
 		while (errs & msp->mask) {
 			these = (errs & msp->mask);
@@ -1303,9 +1309,14 @@  static void err_decode(char *msg, size_t len, u64 errs,
 					*msg++ = ',';
 					len--;
 				}
-				took = scnprintf(msg, len, "%s", msp->msg);
+				BUG_ON(!msp->sz);
+				/* msp->sz counts the nul */
+				took = min_t(size_t, msp->sz - (size_t)1, len);
+				memcpy(msg,  msp->msg, took);
 				len -= took;
 				msg += took;
+				if (len)
+					*msg = '\0';
 			}
 			errs &= ~lmask;
 			if (len && multi) {
@@ -1643,6 +1654,14 @@  done:
 	return;
 }
 
+static void qib_error_tasklet(unsigned long data)
+{
+	struct qib_devdata *dd = (struct qib_devdata *)data;
+
+	handle_7322_errors(dd);
+	qib_write_kreg(dd, kr_errmask, dd->cspec->errormask);
+}
+
 static void reenable_chase(unsigned long opaque)
 {
 	struct qib_pportdata *ppd = (struct qib_pportdata *)opaque;
@@ -2724,8 +2743,10 @@  static noinline void unlikely_7322_intr(struct qib_devdata *dd, u64 istat)
 		unknown_7322_ibits(dd, istat);
 	if (istat & QIB_I_GPIO)
 		unknown_7322_gpio_intr(dd);
-	if (istat & QIB_I_C_ERROR)
-		handle_7322_errors(dd);
+	if (istat & QIB_I_C_ERROR) {
+		qib_write_kreg(dd, kr_errmask, 0ULL);
+		tasklet_schedule(&dd->error_tasklet);
+	}
 	if (istat & INT_MASK_P(Err, 0) && dd->rcd[0])
 		handle_7322_p_errors(dd->rcd[0]->ppd);
 	if (istat & INT_MASK_P(Err, 1) && dd->rcd[1])
@@ -3124,6 +3145,8 @@  try_intx:
 			arg = dd->rcd[ctxt];
 			if (!arg)
 				continue;
+			if (qib_krcvq01_no_msi && ctxt < 2)
+				continue;
 			lsb = QIB_I_RCVAVAIL_LSB + ctxt;
 			handler = qib_7322pintr;
 			name = QIB_DRV_NAME " (kctx)";
@@ -3158,6 +3181,8 @@  try_intx:
 	for (i = 0; i < ARRAY_SIZE(redirect); i++)
 		qib_write_kreg(dd, kr_intredirect + i, redirect[i]);
 	dd->cspec->main_int_mask = mask;
+	tasklet_init(&dd->error_tasklet, qib_error_tasklet,
+		(unsigned long)dd);
 bail:;
 }
 
@@ -6787,6 +6812,10 @@  struct qib_devdata *qib_init_iba7322_funcs(struct pci_dev *pdev,
 		    (i >= ARRAY_SIZE(irq_table) &&
 		     dd->rcd[i - ARRAY_SIZE(irq_table)]))
 			actual_cnt++;
+	/* reduce by ctxt's < 2 */
+	if (qib_krcvq01_no_msi)
+		actual_cnt -= dd->num_pports;
+
 	tabsize = actual_cnt;
 	dd->cspec->msix_entries = kmalloc(tabsize *
 			sizeof(struct msix_entry), GFP_KERNEL);