Message ID | 20150414141806.GA7354@phlsvsds.ph.intel.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On 04/14/2015 04:18 PM, ira.weiny wrote: [snip] > > /** > - * cap_ib_cm_dev - Check if any port of device has the capability Infiniband > - * Communication Manager. > + * cap_ib_cm_any_port - Check if any port of the device has Infiniband > + * Communication Manager (CM) support. > * > * @device: Device to be checked > * > - * Return 0 when all port of the device don't support Infiniband > - * Communication Manager. > + * Return 1 if any port of the device supports the IB CM. > */ > -static inline int cap_ib_cm_dev(struct ib_device *device) > +static inline int cap_ib_cm_any_port(struct ib_device *device) > { > int i; I think we maybe able to get rid of this helper according to Sean's suggestion :-) We just need to check the port 1 of HCA see if it support ib cm, seems like currently there is no case that port 1 support cm while others doesn't. Regards, Michael Wang > > >> >> That seems reasonable, and solves the #10 problem, but we should >> enforce this invariant during device register. >> >> Typically the ports seem to be completely orthogonal (like SA), so in those >> cases the _dev and restriction makes no sense. > > While the ports in ib_sa and ib_umad probably can be orthogonal the current > implementation does not support that and this patch series obscures that a bit. > >> >> CM seems to be different, so it should probably enforce its rules > > Technically, the implementation of the ib_sa and ib_umad modules are not > different it is just that Michaels patch is not broken. > > I'm not saying we can't change the ib_sa and ib_umad modules but the current > logic is all or nothing. And I think changing this : > > 1) require more review and testing > 2) is not the purpose of this series. > > Ira > >> >> Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 14, 2015 at 04:32:57PM +0200, Michael Wang wrote: > > > On 04/14/2015 04:18 PM, ira.weiny wrote: > [snip] > > > > /** > > - * cap_ib_cm_dev - Check if any port of device has the capability Infiniband > > - * Communication Manager. > > + * cap_ib_cm_any_port - Check if any port of the device has Infiniband > > + * Communication Manager (CM) support. > > * > > * @device: Device to be checked > > * > > - * Return 0 when all port of the device don't support Infiniband > > - * Communication Manager. > > + * Return 1 if any port of the device supports the IB CM. > > */ > > -static inline int cap_ib_cm_dev(struct ib_device *device) > > +static inline int cap_ib_cm_any_port(struct ib_device *device) > > { > > int i; > > I think we maybe able to get rid of this helper according to Sean's suggestion :-) > > We just need to check the port 1 of HCA see if it support ib cm, seems like > currently there is no case that port 1 support cm while others doesn't. But that moves us in the wrong direction. If we later support port 2 without port 1 that code will be broken. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/14/2015 05:40 PM, ira.weiny wrote: > On Tue, Apr 14, 2015 at 04:32:57PM +0200, Michael Wang wrote: >> >> >> On 04/14/2015 04:18 PM, ira.weiny wrote: >> [snip] >>> >>> /** >>> - * cap_ib_cm_dev - Check if any port of device has the capability Infiniband >>> - * Communication Manager. >>> + * cap_ib_cm_any_port - Check if any port of the device has Infiniband >>> + * Communication Manager (CM) support. >>> * >>> * @device: Device to be checked >>> * >>> - * Return 0 when all port of the device don't support Infiniband >>> - * Communication Manager. >>> + * Return 1 if any port of the device supports the IB CM. >>> */ >>> -static inline int cap_ib_cm_dev(struct ib_device *device) >>> +static inline int cap_ib_cm_any_port(struct ib_device *device) >>> { >>> int i; >> >> I think we maybe able to get rid of this helper according to Sean's suggestion :-) >> >> We just need to check the port 1 of HCA see if it support ib cm, seems like >> currently there is no case that port 1 support cm while others doesn't. > > But that moves us in the wrong direction. If we later support port 2 without > port 1 that code will be broken. I'm not sure if we should sacrifice the consistency at this moment for such 'future' capability... maybe we can leave such reform work to those who introduce the new capability? Regards, Michael Wang > > Ira > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> But that moves us in the wrong direction. If we later support port 2 > without > port 1 that code will be broken. I agree that the code will be broken, but supporting that model requires a lot more work in how the ib_cm listens across devices. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 14, 2015 at 10:18:07AM -0400, ira.weiny wrote: > After more thought and reading other opinions, I must agree we should not > have cap_foo_dev. I looked at it a bit, and I think Sean has also basically said, CM does not support certain mixed port combinations. iWarp and IB simply cannot be mixed with the current CM and it doesn't look easy to fix that. We can fix a few point areas simply, but not all of it. So we have to have the _dev tests, only to fill the CM's need and they must have the all true/all false/BUG semantics CM demands. Verify on register. > While the ports in ib_sa and ib_umad probably can be orthogonal the current > implementation does not support that and this patch series obscures that a bit. Really? Do you see any bugs/missed things? Both were made port orthogonal when RoCEE was added, because RoCEE needs that. CM wasn't because RoCEE and IB seem to use almost the same code, though I wonder if mixing really works 100%.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 14, 2015 at 11:25:15AM -0600, Jason Gunthorpe wrote: > On Tue, Apr 14, 2015 at 10:18:07AM -0400, ira.weiny wrote: > > > After more thought and reading other opinions, I must agree we should not > > have cap_foo_dev. > > I looked at it a bit, and I think Sean has also basically said, CM > does not support certain mixed port combinations. iWarp and IB simply > cannot be mixed with the current CM and it doesn't look easy to fix > that. We can fix a few point areas simply, but not all of it. > > So we have to have the _dev tests, only to fill the CM's need and they > must have the all true/all false/BUG semantics CM demands. > > Verify on register. > > > While the ports in ib_sa and ib_umad probably can be orthogonal the current > > implementation does not support that and this patch series obscures that a bit. > > Really? Do you see any bugs/missed things? Both were made port > orthogonal when RoCEE was added, because RoCEE needs that. They are not completely orthogonal: A failure to init port 2 ends up ends up "killing" port 1 and releasing the device associated resources. static void ib_umad_add_one(struct ib_device *device) { ... if (ib_umad_init_port(device, i, umad_dev, &umad_dev->port[i - s])) goto err; ... err: while (--i >= s) { if (!cap_ib_mad(device, i)) continue; ib_umad_kill_port(&umad_dev->port[i - s]); } kobject_put(&umad_dev->kobj); } > > CM wasn't because RoCEE and IB seem to use almost the same code, > though I wonder if mixing really works 100%.. The support can (and should) be orthogonal but the implementation is incomplete. Ira > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 14, 2015 at 01:43:11PM -0400, ira.weiny wrote: > A failure to init port 2 ends up ends up "killing" port 1 and releasing the > device associated resources. Yes, that is the only reasonable thing that could happen. init failure should only be possible under exceptional cases (OOM). The only system response is to call ib_umad_add_one again - so of course the first call had to completely clean up everything it did. Hopefully all these errors propogate enough so that driver insmod fails with a perfect clean up. Otherwise it is broken :| Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Yes, that is the only reasonable thing that could happen. > > init failure should only be possible under exceptional cases (OOM). > > The only system response is to call ib_umad_add_one again - so of > course the first call had to completely clean up everything it did. A reasonable follow up change would be to replace the add device callbacks with add port callbacks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 14, 2015 at 06:02:47PM +0000, Hefty, Sean wrote: > > Yes, that is the only reasonable thing that could happen. > > > > init failure should only be possible under exceptional cases (OOM). > > > > The only system response is to call ib_umad_add_one again - so of > > course the first call had to completely clean up everything it did. > > A reasonable follow up change would be to replace the add device > callbacks with add port callbacks. Yes, combined with a port argument to ib_set_client_data / ib_get_client_data it would be a nice simplifying clean up. It would be nice to have sane error handling too :( In an ideal world the add call back should return an error and the thing that triggered it should unwind back to module load failure. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/14/2015 08:21 PM, Jason Gunthorpe wrote: > On Tue, Apr 14, 2015 at 06:02:47PM +0000, Hefty, Sean wrote: >>> Yes, that is the only reasonable thing that could happen. >>> >>> init failure should only be possible under exceptional cases (OOM). >>> >>> The only system response is to call ib_umad_add_one again - so of >>> course the first call had to completely clean up everything it did. >> >> A reasonable follow up change would be to replace the add device >> callbacks with add port callbacks. > > Yes, combined with a port argument to ib_set_client_data / > ib_get_client_data it would be a nice simplifying clean up. > > It would be nice to have sane error handling too :( In an ideal world > the add call back should return an error and the thing that triggered > it should unwind back to module load failure. We can give client->add() callback a return value and make ib_register_device() return -ENOMEM when it failed, just wondering why we don't do this at first, any special reason? Regards, Michael Wang > > Jason > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 15, 2015 at 09:58:18AM +0200, Michael Wang wrote: > We can give client->add() callback a return value and make > ib_register_device() return -ENOMEM when it failed, just wondering > why we don't do this at first, any special reason? No idea, but having ib_register_device fail and unwind if a client fails to attach makes sense to me. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 16, 2015 at 9:44 AM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: >> We can give client->add() callback a return value and make >> ib_register_device() return -ENOMEM when it failed, just wondering >> why we don't do this at first, any special reason? > No idea, but having ib_register_device fail and unwind if a client > fails to attach makes sense to me. It seems a bit unfriendly to fail an entire device if one ULP has a problem. Let's say you have a system whose main network connection is IPoIB. Would you want that connection to come up even if, say, the NFS/RDMA server fails to find the memory registration type it likes? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > On Wed, Apr 15, 2015 at 09:58:18AM +0200, Michael Wang wrote: > > > We can give client->add() callback a return value and make > > ib_register_device() return -ENOMEM when it failed, just wondering why > > we don't do this at first, any special reason? > > No idea, but having ib_register_device fail and unwind if a client fails to attach > makes sense to me. Yes that is what we should do _but_ I think we should tackle that in a different series. As you said in another email, this series is getting very long and hard to review/prove is correct. This is why I was advocating keeping a check at the top of cm_add_one which verified all Ports supported the CM. This is the current logic today and is proven to work for the devices/use cases out there. We can clean up the initialization code and implement support for individual ports in follow on patches. Ira -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PiA+IE5vIGlkZWEsIGJ1dCBoYXZpbmcgaWJfcmVnaXN0ZXJfZGV2aWNlIGZhaWwgYW5kIHVud2lu ZCBpZiBhIGNsaWVudA0KPiA+IGZhaWxzIHRvIGF0dGFjaCBtYWtlcyBzZW5zZSB0byBtZS4NCj4g DQo+IEl0IHNlZW1zIGEgYml0IHVuZnJpZW5kbHkgdG8gZmFpbCBhbiBlbnRpcmUgZGV2aWNlIGlm IG9uZSBVTFAgaGFzIGENCj4gcHJvYmxlbS4gIExldCdzIHNheSB5b3UgaGF2ZSBhIHN5c3RlbSB3 aG9zZSBtYWluIG5ldHdvcmsgY29ubmVjdGlvbiBpcw0KPiBJUG9JQi4gIFdvdWxkIHlvdSB3YW50 IHRoYXQgY29ubmVjdGlvbiB0byBjb21lIHVwIGV2ZW4gaWYsIHNheSwgdGhlDQo+IE5GUy9SRE1B IHNlcnZlciBmYWlscyB0byBmaW5kIHRoZSBtZW1vcnkgcmVnaXN0cmF0aW9uIHR5cGUgaXQgbGlr ZXM/DQoNCldoYXQncyBtaXNzaW5nIGlzIHNvbWUgd2F5IGZvciB0aGUgZGV2aWNlIHRvIGluZGlj YXRlIHdoaWNoIG1vZHVsZXMgYXJlIGFjdHVhbGx5IG5lY2Vzc2FyeSBmb3IgaXQgdG8gcnVuLiAg V2l0aG91dCBoYXZpbmcgc29tZXRoaW5nIGxpa2UgdGhhdCwgSSBhZ3JlZSB3aXRoIFJvbGFuZC4N Cg0KLSBTZWFuIA0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 16, 2015 at 10:02:46AM -0700, Roland Dreier wrote: > On Thu, Apr 16, 2015 at 9:44 AM, Jason Gunthorpe > <jgunthorpe@obsidianresearch.com> wrote: > >> We can give client->add() callback a return value and make > >> ib_register_device() return -ENOMEM when it failed, just wondering > >> why we don't do this at first, any special reason? > > > No idea, but having ib_register_device fail and unwind if a client > > fails to attach makes sense to me. > > It seems a bit unfriendly to fail an entire device if one ULP has a > problem. Let's say you have a system whose main network connection is > IPoIB. Would you want that connection to come up even if, say, the > NFS/RDMA server fails to find the memory registration type it likes? That is true, but we also never test any of the cases where one expected ULP fails to load but another one needs it. Like IPoIB needs sa_query, for instance. I'm not saying ULPs should error for soft reasons, like your NFS example, this unwind would be for OOM, or other 'impossible' class errors. That said, the driver core does not fail a bus driver register if a client probe fails (we could copy this), but it also doesn't just silently eat the error code either. What about ib_register_client? If any add's fail should it unwind and fail upwards? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Roland Thanks for the comment :-) On 04/16/2015 07:02 PM, Roland Dreier wrote: > On Thu, Apr 16, 2015 at 9:44 AM, Jason Gunthorpe > <jgunthorpe@obsidianresearch.com> wrote: >>> We can give client->add() callback a return value and make >>> ib_register_device() return -ENOMEM when it failed, just wondering >>> why we don't do this at first, any special reason? > >> No idea, but having ib_register_device fail and unwind if a client >> fails to attach makes sense to me. > > It seems a bit unfriendly to fail an entire device if one ULP has a > problem. Let's say you have a system whose main network connection is > IPoIB. Would you want that connection to come up even if, say, the > NFS/RDMA server fails to find the memory registration type it likes? Agree, the idea is correct that one client's initialization failure should not influence the whole device, as long as the rest client can keep the device working (but how to estimate that...). While just ignore the failure seems really strange... Regards, Michael Wang > > - R. > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/16/2015 07:05 PM, Weiny, Ira wrote: >> >> On Wed, Apr 15, 2015 at 09:58:18AM +0200, Michael Wang wrote: >> >>> We can give client->add() callback a return value and make >>> ib_register_device() return -ENOMEM when it failed, just wondering why >>> we don't do this at first, any special reason? >> >> No idea, but having ib_register_device fail and unwind if a client fails to attach >> makes sense to me. > > Yes that is what we should do _but_ > > I think we should tackle that in a different series. > > As you said in another email, this series is getting very long and hard to review/prove is correct. This is why I was advocating keeping a check at the top of cm_add_one which verified all Ports supported the CM. This is the current logic today and is proven to work for the devices/use cases out there. > > We can clean up the initialization code and implement support for individual ports in follow on patches. Agree, as long as this series do not introduce any Bug, I suggest we put other reform ideas into next series :-) We have already eliminate the old inferring way and integrate all the cases into helpers, further reform should be far more clear based on this foundation. Regards, Michael Wang > > Ira > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 63418ee..0d0fc24 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3761,7 +3761,11 @@ static void cm_add_one(struct ib_device *ib_device) unsigned long flags; int ret; u8 i; - int count = 0; + + for (i = 1; i <= ib_device->phys_port_cnt; i++) { + if (!cap_ib_cm(ib_device, i)) + return; + } cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) * ib_device->phys_port_cnt, GFP_KERNEL); @@ -3810,14 +3814,6 @@ static void cm_add_one(struct ib_device *ib_device) ret = ib_modify_port(ib_device, i, 0, &port_modify); if (ret) goto error3; - - count++; - } - - if (!count) { - device_unregister(cm_dev->device); - kfree(cm_dev); - return; } ib_set_client_data(ib_device, &cm_client, cm_dev); > > Thinking about it some more, cap_foo_dev only makes sense if all ports > are either true or false. Mixed is a BUG. Agree After more thought and reading other opinions, I must agree we should not have cap_foo_dev. For the CM case which has some need to support itself device device wide what about this patch (compile tested only): 10:03:57 > git di diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 4b083f5..7347445 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1639,7 +1639,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv, struct rdma_cm_id *id; int ret; - if (cma_family(id_priv) == AF_IB && !cap_ib_cm_dev(cma_dev->device)) + if (cma_family(id_priv) == AF_IB && !cap_ib_cm_any_port(cma_dev->device)) return; id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps, @@ -2030,7 +2030,7 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv) mutex_lock(&lock); list_for_each_entry(cur_dev, &dev_list, list) { if (cma_family(id_priv) == AF_IB && - !cap_ib_cm_dev(cur_dev->device)) + !cap_ib_cm_any_port(cur_dev->device)) continue; if (!cma_dev) diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index 065405e..dc4caae 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -1253,7 +1253,7 @@ static void ib_ucm_add_one(struct ib_device *device) dev_t base; struct ib_ucm_device *ucm_dev; - if (!device->alloc_ucontext || !cap_ib_cm_dev(device)) + if (!device->alloc_ucontext || !cap_ib_cm_any_port(device)) return; ucm_dev = kzalloc(sizeof *ucm_dev, GFP_KERNEL); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 3cc3f53..a8fa1f5 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1920,15 +1920,14 @@ static inline int cap_read_multi_sge(struct ib_device *device, u8 port_num) } /** - * cap_ib_cm_dev - Check if any port of device has the capability Infiniband - * Communication Manager. + * cap_ib_cm_any_port - Check if any port of the device has Infiniband + * Communication Manager (CM) support. * * @device: Device to be checked * - * Return 0 when all port of the device don't support Infiniband - * Communication Manager. + * Return 1 if any port of the device supports the IB CM. */ -static inline int cap_ib_cm_dev(struct ib_device *device) +static inline int cap_ib_cm_any_port(struct ib_device *device) { int i;