Message ID | CAAKD3BB5-hUC=0bDqxXxuag9TPqQEeMD74paHrAMOc-MsQwEow@mail.gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On 12/24/2015 11:09 AM, Matan Barak wrote: > On Thu, Dec 24, 2015 at 4:07 PM, Matan Barak <matanb@dev.mellanox.co.il> wrote: >> On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote: >>> On 12/24/2015 12:42 PM, Sagi Grimberg wrote: >>>> >>>> >>>>>> This patch seems to generate a list corruption [1] when I test >>>>>> with Doug's for-4.5 tree. Eran, care to take a look at this? >>>>> >>>>> >>>>> This patch is part from a series that was introduced in 4.3-rc1 [1], >>>> >>>> >>>> Then something else broke it. Can people check their patches on doug's >>>> tree? At the moment it's unusable... >>> >> >> Leon and I have checked Doug's tree with mlx4_ib disabled and we >> didn't encounter any error. >> We ran ucmatose over IB connection (in mlx5) and it worked flawlessly. >> >>> >>> Yes, I checked the branch up to commit 882f3b3 "Merge branches >>> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping, >>> ibv_rc_pingpong over top of mlx4 VPI) >>> > > Regarding mlx4, Eran and I analyzed it. We didn't test that, but it > seems like the bug is introduced in the 64bit counters test. Here's a > proposal: > > diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c > index 539040f..8da3c83 100644 > --- a/drivers/infiniband/core/sysfs.c > +++ b/drivers/infiniband/core/sysfs.c > @@ -714,11 +714,12 @@ err: > * Figure out which counter table to use depending on > * the device capabilities. > */ > -static struct attribute_group *get_counter_table(struct ib_device *dev) > +static struct attribute_group *get_counter_table(struct ib_device *dev, > + int port_num) > { > struct ib_class_port_info cpi; > > - if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO, > + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO, > &cpi, 40, sizeof(cpi)) >= 0) { Your proposal is similar to earlier version of Christoph's patch but was changed since ClassPortInfo attribute does not have PortSelect field like other PerfMgt attributes which is where this port num would be placed. In ClassPortInfo attribute, that location would be the ClassVersion field that would be set to port number in PerfMgt Get query. -- Hal > > if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) > @@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int port_num, > goto err_put; > } > > - p->pma_table = get_counter_table(device); > + p->pma_table = get_counter_table(device, port_num); > ret = sysfs_create_group(&p->kobj, p->pma_table); > if (ret) > goto err_put_gid_attrs; > > >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/25/2015 9:50 AM, Hal Rosenstock wrote: > On 12/24/2015 11:09 AM, Matan Barak wrote: >> On Thu, Dec 24, 2015 at 4:07 PM, Matan Barak <matanb@dev.mellanox.co.il> wrote: >>> On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote: >>>> On 12/24/2015 12:42 PM, Sagi Grimberg wrote: >>>>> >>>>> >>>>>>> This patch seems to generate a list corruption [1] when I test >>>>>>> with Doug's for-4.5 tree. Eran, care to take a look at this? >>>>>> >>>>>> >>>>>> This patch is part from a series that was introduced in 4.3-rc1 [1], >>>>> >>>>> >>>>> Then something else broke it. Can people check their patches on doug's >>>>> tree? At the moment it's unusable... >>>> >>> >>> Leon and I have checked Doug's tree with mlx4_ib disabled and we >>> didn't encounter any error. >>> We ran ucmatose over IB connection (in mlx5) and it worked flawlessly. >>> >>>> >>>> Yes, I checked the branch up to commit 882f3b3 "Merge branches >>>> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping, >>>> ibv_rc_pingpong over top of mlx4 VPI) >>>> >> >> Regarding mlx4, Eran and I analyzed it. We didn't test that, but it >> seems like the bug is introduced in the 64bit counters test. Here's a >> proposal: >> >> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c >> index 539040f..8da3c83 100644 >> --- a/drivers/infiniband/core/sysfs.c >> +++ b/drivers/infiniband/core/sysfs.c >> @@ -714,11 +714,12 @@ err: >> * Figure out which counter table to use depending on >> * the device capabilities. >> */ >> -static struct attribute_group *get_counter_table(struct ib_device *dev) >> +static struct attribute_group *get_counter_table(struct ib_device *dev, >> + int port_num) >> { >> struct ib_class_port_info cpi; >> >> - if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO, >> + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO, >> &cpi, 40, sizeof(cpi)) >= 0) { > > Your proposal is similar to earlier version of Christoph's patch but was > changed since ClassPortInfo attribute does not have PortSelect field > like other PerfMgt attributes which is where this port num would be > placed. In ClassPortInfo attribute, that location would be the > ClassVersion field that would be set to port number in PerfMgt Get query. In actuality, I don't think it really matters as this is a Get not a Set and the PMA would do the right thing even if some field in the CPI were stepped on. > -- Hal > >> >> if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) >> @@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int port_num, >> goto err_put; >> } >> >> - p->pma_table = get_counter_table(device); >> + p->pma_table = get_counter_table(device, port_num); >> ret = sysfs_create_group(&p->kobj, p->pma_table); >> if (ret) >> goto err_put_gid_attrs; >> >> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Dec 25, 2015 at 5:56 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote: > On 12/25/2015 9:50 AM, Hal Rosenstock wrote: >> On 12/24/2015 11:09 AM, Matan Barak wrote: >>> On Thu, Dec 24, 2015 at 4:07 PM, Matan Barak <matanb@dev.mellanox.co.il> wrote: >>>> On Thu, Dec 24, 2015 at 2:38 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote: >>>>> On 12/24/2015 12:42 PM, Sagi Grimberg wrote: >>>>>> >>>>>> >>>>>>>> This patch seems to generate a list corruption [1] when I test >>>>>>>> with Doug's for-4.5 tree. Eran, care to take a look at this? >>>>>>> >>>>>>> >>>>>>> This patch is part from a series that was introduced in 4.3-rc1 [1], >>>>>> >>>>>> >>>>>> Then something else broke it. Can people check their patches on doug's >>>>>> tree? At the moment it's unusable... >>>>> >>>> >>>> Leon and I have checked Doug's tree with mlx4_ib disabled and we >>>> didn't encounter any error. >>>> We ran ucmatose over IB connection (in mlx5) and it worked flawlessly. >>>> >>>>> >>>>> Yes, I checked the branch up to commit 882f3b3 "Merge branches >>>>> '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5" and it works (rping, >>>>> ibv_rc_pingpong over top of mlx4 VPI) >>>>> >>> >>> Regarding mlx4, Eran and I analyzed it. We didn't test that, but it >>> seems like the bug is introduced in the 64bit counters test. Here's a >>> proposal: >>> >>> diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c >>> index 539040f..8da3c83 100644 >>> --- a/drivers/infiniband/core/sysfs.c >>> +++ b/drivers/infiniband/core/sysfs.c >>> @@ -714,11 +714,12 @@ err: >>> * Figure out which counter table to use depending on >>> * the device capabilities. >>> */ >>> -static struct attribute_group *get_counter_table(struct ib_device *dev) >>> +static struct attribute_group *get_counter_table(struct ib_device *dev, >>> + int port_num) >>> { >>> struct ib_class_port_info cpi; >>> >>> - if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO, >>> + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO, >>> &cpi, 40, sizeof(cpi)) >= 0) { >> >> Your proposal is similar to earlier version of Christoph's patch but was >> changed since ClassPortInfo attribute does not have PortSelect field >> like other PerfMgt attributes which is where this port num would be >> placed. In ClassPortInfo attribute, that location would be the >> ClassVersion field that would be set to port number in PerfMgt Get query. > > In actuality, I don't think it really matters as this is a Get not a Set > and the PMA would do the right thing even if some field in the CPI were > stepped on. > Well, it does matter as it calls the vendor driver with port_num = 0. Since the kernel is trusted, the vendor driver expects a valid port number. Giving it an invalid number might result in memory corruptions, as demonstrated in this case. >> -- Hal Matan >> >>> >>> if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) >>> @@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int port_num, >>> goto err_put; >>> } >>> >>> - p->pma_table = get_counter_table(device); >>> + p->pma_table = get_counter_table(device, port_num); >>> ret = sysfs_create_group(&p->kobj, p->pma_table); >>> if (ret) >>> goto err_put_gid_attrs; >>> >>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index 539040f..8da3c83 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -714,11 +714,12 @@ err: * Figure out which counter table to use depending on * the device capabilities. */ -static struct attribute_group *get_counter_table(struct ib_device *dev) +static struct attribute_group *get_counter_table(struct ib_device *dev, + int port_num) { struct ib_class_port_info cpi; - if (get_perf_mad(dev, 0, IB_PMA_CLASS_PORT_INFO, + if (get_perf_mad(dev, port_num, IB_PMA_CLASS_PORT_INFO, &cpi, 40, sizeof(cpi)) >= 0) { if (cpi.capability_mask && IB_PMA_CLASS_CAP_EXT_WIDTH) @@ -776,7 +777,7 @@ static int add_port(struct ib_device *device, int port_num, goto err_put; } - p->pma_table = get_counter_table(device); + p->pma_table = get_counter_table(device, port_num); ret = sysfs_create_group(&p->kobj, p->pma_table); if (ret) goto err_put_gid_attrs;