Message ID | 20150721025639.GX58053@asylum.americas.sgi.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, Jul 21, 2015 at 5:56 AM, Alex Thorlton <athorlton@sgi.com> wrote: > On Mon, Jul 20, 2015 at 11:28:03AM -0500, Alex Thorlton wrote: >> I've got some time on the large machine later today. I'll give this a >> try then. > > I ran a boot with this patch applied: > > diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h > index 83e80ab..c84aea0 100644 > --- a/include/linux/mlx4/device.h > +++ b/include/linux/mlx4/device.h > @@ -45,7 +45,7 @@ > #include <linux/timecounter.h> > > #define MAX_MSIX_P_PORT 17 > -#define MAX_MSIX 64 > +#define MAX_MSIX 8192 > #define MSIX_LEGACY_SZ 4 > #define MIN_MSIX_P_PORT 5 > > I went for a max of 8192, since I was actually booting the machine with > 6144 cores (not 4096) for this run. It doesn't look like this fixed the > problem. I still saw the same errors during boot. > > FWIW, the module does appear to still successfully load: > > 8<--- > # lsmod | grep mlx > mlx4_ib 151552 0 > ib_sa 32768 1 mlx4_ib > ib_mad 49152 2 ib_sa,mlx4_ib > ib_core 102400 3 ib_sa,mlx4_ib,ib_mad > mlx4_core 278528 1 mlx4_ib > --->8 > > If the module loading is good enough, and we should just ignore the > errors, then I'm fine with that. Just wanting to make sure that > everything is behaving correctly. It shouldn't be a problem, as all unused/erroneous EQs get "-1". We'll try to reproduce the problem here, it might take awhile though. Thanks for checking this, Matan > > - Alex > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 83e80ab..c84aea0 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -45,7 +45,7 @@ #include <linux/timecounter.h> #define MAX_MSIX_P_PORT 17 -#define MAX_MSIX 64 +#define MAX_MSIX 8192 #define MSIX_LEGACY_SZ 4 #define MIN_MSIX_P_PORT 5