Message ID | 20150611230901.16479.18231.stgit@phlsvslse11.ph.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Fri, Jun 12, 2015 at 2:09 AM, Mike Marciniszyn <mike.marciniszyn@intel.com> wrote: > +++ b/drivers/infiniband/hw/hfi1/device.c > +int __init dev_init(void) > +{ > + int ret; > + > + ret = alloc_chrdev_region(&hfi1_dev, 0, HFI1_NMINORS, DRIVER_NAME); > + if (ret < 0) { > + pr_err("Could not allocate chrdev region (err %d)\n", -ret); > + goto done; > + } > + > + class = class_create(THIS_MODULE, class_name()); > + if (IS_ERR(class)) { > + ret = PTR_ERR(class); > + pr_err("Could not create device class (err %d)\n", -ret); > + unregister_chrdev_region(hfi1_dev, HFI1_NMINORS); > + } > + > +done: > + return ret; > +} so what's the role of the char-device? why should a low-level driver which is part of the upstream RDMA stack contain a char-device? Or. > + > +void dev_cleanup(void) > +{ > + if (class) { > + class_destroy(class); > + class = NULL; > + } > + > + unregister_chrdev_region(hfi1_dev, HFI1_NMINORS); > +} -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > +int __init dev_init(void) > > +{ > > + int ret; > > + > > + ret = alloc_chrdev_region(&hfi1_dev, 0, HFI1_NMINORS, > DRIVER_NAME); > > + if (ret < 0) { > > + pr_err("Could not allocate chrdev region (err %d)\n", - > ret); > > + goto done; > > + } > > + > > + class = class_create(THIS_MODULE, class_name()); > > + if (IS_ERR(class)) { > > + ret = PTR_ERR(class); > > + pr_err("Could not create device class (err %d)\n", - > ret); > > + unregister_chrdev_region(hfi1_dev, HFI1_NMINORS); > > + } > > + > > +done: > > + return ret; > > +} > > so what's the role of the char-device? Someone from Mike's team can chime in, but I believe this supports non-verbs interfaces (i.e. PSM), similar to what's done for qib.
On Sun, Jun 14, 2015 at 11:58:00PM +0300, Or Gerlitz wrote: > On Fri, Jun 12, 2015 at 2:09 AM, Mike Marciniszyn > <mike.marciniszyn@intel.com> wrote: > > +++ b/drivers/infiniband/hw/hfi1/device.c > > > +int __init dev_init(void) > > +{ > > + int ret; > > + > > + ret = alloc_chrdev_region(&hfi1_dev, 0, HFI1_NMINORS, DRIVER_NAME); > > + if (ret < 0) { > > + pr_err("Could not allocate chrdev region (err %d)\n", -ret); > > + goto done; > > + } > > + > > + class = class_create(THIS_MODULE, class_name()); > > + if (IS_ERR(class)) { > > + ret = PTR_ERR(class); > > + pr_err("Could not create device class (err %d)\n", -ret); > > + unregister_chrdev_region(hfi1_dev, HFI1_NMINORS); > > + } > > + > > +done: > > + return ret; > > +} > > so what's the role of the char-device? why should a low-level driver > which is part of the upstream RDMA stack contain a char-device? I for one would like to see a summary of any non-standard UAPI from this driver. Was Al Viro's comment on qib addressed here? That would be a showstopper for me.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Was Al Viro's comment on qib addressed here? That would be a > showstopper for me.. Do you have a link to this comment? I'm missing a bunch of messages from this thread and can't find anything from Al in the logs. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 15, 2015 at 06:11:02PM +0000, Hefty, Sean wrote: > > Was Al Viro's comment on qib addressed here? That would be a > > showstopper for me.. > > Do you have a link to this comment? I'm missing a bunch of messages from this thread and can't find anything from Al in the logs. Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sat Apr 4 00:11:32 2015 -0400 infinibad: weird APIs switched to ->write_iter() Things Not To Do When Writing A Driver, part 1001st: have writev() and write() on the same file doing completely different things. As in, "interpret very different sets of commands". We _can_ handle that, but it's a bloody bad idea. Don't do that in new drivers. Ever. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Jasn -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Do you have a link to this comment? I'm missing a bunch of messages from > this thread and can't find anything from Al in the logs. > This commit appeared in qib and it did not appear on linux-rdma. I never saw it until it appeared. I emailed on list to Al in http://marc.info/?l=linux-rdma&m=143024595127255&w=2. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Do you have a link to this comment? I'm missing a bunch of messages > > > from > > this thread and can't find anything from Al in the logs. > > > > This commit appeared in qib and it did not appear on linux-rdma. I never saw it > until it appeared. > > I emailed on list to Al in http://marc.info/?l=linux- > rdma&m=143024595127255&w=2. > Jason, what is your take on ioctl vs. write. PSM2 (and PSM) uses this to dialog with the driver for hardware specific setup. I'm suspected at some point in the past, this was ioctl based and changed due to bias against ioctl. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> PSM2 (and PSM) uses this to dialog with the driver for hardware specific > setup. > > I'm suspected at some point in the past, this was ioctl based and changed > due to bias against ioctl. Could this be distinguished based on a common command header? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 17, 2015 at 12:05:08PM +0000, Marciniszyn, Mike wrote:
> Jason, what is your take on ioctl vs. write.
Well, I think the global view keeps changing, so I'm not sure what is
trendy now..
But personally, I hate seeing write() used to emulate ioctl() because
'ioctl is bad'. ie if you are 'writing' a struct that contains user
pointers and you expect the kernel to read/write to user memory, then
use ioctl. (and that is the 'badness' of ioctl, so pretending it is
write doesn't help anything)
If you can formulate your communication like netlink does, where it is
more like a network RPC, where *everything* is in the buffer passed
to write and you have to call read to get a reply, then use
read/write. This is generally considered preferable, and is what
people mean when they say write is preferred. As I understand
it.
netlink is a reasonable low speed format to use for this kind of
serialization, either via the common mux or via your own char device.
I also wonder about all those sysfs files. I think the over reliance
on sysfs in rdma may have been a mistake, sending the same information
over netlink would be more consistent with what netdev is doing. (eg
you can't view the netdev IP addresses via sysfs, but you can view
rdma guids via sysfs)
Overall, I'd be alot happier with your driver patchset if it was split
as 'core only, no UAPI changes' followed up by 'Add UAPI for X'.
If you can't make a verbs provider work under 'core only' then I think
something is very wrong...
UAPI stuff in drivers is often a red flag, and you guys should think
really carefully about what OPA elements should be buried in the
driver and what elements should be common to all OPA adapters.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
> netlink is a reasonable low speed format to use for this kind of serialization, > either via the common mux or via your own char device. > A couple of follow-up netlink questions. 1. I assume you are talking about generic netlink vs. say the RDMA netlink. The generic netlink handles dynamic id assignment which would seem to make it a better choice. 2. Can you clarify "via your own char device"? I think netlink is char device independent. For PSM2, the mmap() overload is required for setting up direct hardware access, so we need the device. The difference here is that PSM[12] also uses aio/iter overload for bulk SDMA traffic. The use of write() as a control is consistent with the core. > I also wonder about all those sysfs files. I think the over reliance on sysfs in > rdma may have been a mistake, sending the same information over netlink > would be more consistent with what netdev is doing. (eg you can't view the > netdev IP addresses via sysfs, but you can view rdma guids via sysfs) > The current sysfs usage is consistent with PSM1/qib use with updates for OPA differences. I will update the sysfs docs for both qib and wfr as part of the patch set. > Overall, I'd be alot happier with your driver patchset if it was split as 'core > only, no UAPI changes' followed up by 'Add UAPI for X'. > > If you can't make a verbs provider work under 'core only' then I think > something is very wrong... > > UAPI stuff in drivers is often a red flag, and you guys should think really > carefully about what OPA elements should be buried in the driver and what > elements should be common to all OPA adapters. > By UAPI, do you mean the file in the include/uapi, the sysfs additions, the PSM side of the driver, or all these? Mike -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > so what's the role of the char-device? why should a low-level driver which is > part of the upstream RDMA stack contain a char-device? > > Or. > Contains additional char devices: - PSM character device - diagnostic character devices The nature of the hardware requires both of these additional devices that don't fit as part of the RDMA stack. Mike
On Wed, Jul 08, 2015 at 09:42:39PM +0000, Marciniszyn, Mike wrote: > > netlink is a reasonable low speed format to use for this kind of serialization, > > either via the common mux or via your own char device. > > > > A couple of follow-up netlink questions. > > 1. I assume you are talking about generic netlink vs. say the RDMA netlink. No, 'common mux' means socket(AF_NETLINK) Your own char device, means stuffing netlink messages over your own read/write scheme on /dev/infiniband/whatever, just to re-use the netlink message processing infrastructure. Where this stuff would live in the common mux space, I don't know. Probably under RDMA, like iWarp does. Depends what it is, I guess. > I think netlink is char device independent. For PSM2, the mmap() > overload is required for setting up direct hardware access, so we > need the device. Right, this is why you might want to run netlink messages over a char device so you have mmap. > The difference here is that PSM[12] also uses aio/iter overload for > bulk SDMA traffic. The use of write() as a control is consistent > with the core. Which is what Al said no to, so it has to go. I would not use the core code as an example of modern tastes on how to best build a UAPI. My feeling is ioctl considered better than abusing write() with hidden pointers as the core does. read/write is prefered to ioctl() if it works in a self contained stream basis (ie no pointers) > > I also wonder about all those sysfs files. I think the over reliance on sysfs in > > rdma may have been a mistake, sending the same information over netlink > > would be more consistent with what netdev is doing. (eg you can't view the > > netdev IP addresses via sysfs, but you can view rdma guids via sysfs) > > The current sysfs usage is consistent with PSM1/qib use with updates > for OPA differences. I will update the sysfs docs for both qib and > wfr as part of the patch set. But it keeps growing, and more new sysfs files are being added for OPA stuff. If we were going to move into a netlink direction then doing that at the moment we introduce a whole new protocol like OPA makes sense to me. > > UAPI stuff in drivers is often a red flag, and you guys should > > think really carefully about what OPA elements should be buried in > > the driver and what elements should be common to all OPA adapters. > > By UAPI, do you mean the file in the include/uapi, the sysfs > additions, the PSM side of the driver, or all these? All of these. Drivers should not be designing UAPIs. It is unfortunate that RDMA requires it, but that isn't an excuse to go hog wild and cram gigantic UAPIs into drivers. Particularly the common OPA stuff living in the driver is unsettling. Especially in your case where there is now a history of 50kloc, highly duplicative, driver submissions every couple of years. Two implementations of the PSM userspace stuff is obnoxious. Buring OPA specific stuff and sysfs files in the driver probably means when hfi2 rolls around there will be two copies of all of that. More obnoxious. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> But personally, I hate seeing write() used to emulate ioctl() because 'ioctl is > bad'. ie if you are 'writing' a struct that contains user pointers and you > expect the kernel to read/write to user memory, then use ioctl. (and that is > the 'badness' of ioctl, so pretending it is write doesn't help anything) > So you are ok with ioctl? Based on the statement above, it appears that you are. mike -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/hw/hfi1/device.c b/drivers/infiniband/hw/hfi1/device.c new file mode 100644 index 0000000..07c87a87 --- /dev/null +++ b/drivers/infiniband/hw/hfi1/device.c @@ -0,0 +1,142 @@ +/* + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * Copyright(c) 2015 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * - Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * - Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * - Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + */ + +#include <linux/cdev.h> +#include <linux/module.h> +#include <linux/device.h> +#include <linux/fs.h> + +#include "hfi.h" +#include "device.h" + +static struct class *class; +static dev_t hfi1_dev; + +int hfi1_cdev_init(int minor, const char *name, + const struct file_operations *fops, + struct cdev *cdev, struct device **devp) +{ + const dev_t dev = MKDEV(MAJOR(hfi1_dev), minor); + struct device *device = NULL; + int ret; + + cdev_init(cdev, fops); + cdev->owner = THIS_MODULE; + kobject_set_name(&cdev->kobj, name); + + ret = cdev_add(cdev, dev, 1); + if (ret < 0) { + pr_err("Could not add cdev for minor %d, %s (err %d)\n", + minor, name, -ret); + goto done; + } + + device = device_create(class, NULL, dev, NULL, "%s", name); + if (!IS_ERR(device)) + goto done; + ret = PTR_ERR(device); + device = NULL; + pr_err("Could not create device for minor %d, %s (err %d)\n", + minor, name, -ret); + cdev_del(cdev); +done: + *devp = device; + return ret; +} + +void hfi1_cdev_cleanup(struct cdev *cdev, struct device **devp) +{ + struct device *device = *devp; + + if (device) { + device_unregister(device); + *devp = NULL; + + cdev_del(cdev); + } +} + +static const char *hfi1_class_name = "hfi1"; + +const char *class_name(void) +{ + return hfi1_class_name; +} + +int __init dev_init(void) +{ + int ret; + + ret = alloc_chrdev_region(&hfi1_dev, 0, HFI1_NMINORS, DRIVER_NAME); + if (ret < 0) { + pr_err("Could not allocate chrdev region (err %d)\n", -ret); + goto done; + } + + class = class_create(THIS_MODULE, class_name()); + if (IS_ERR(class)) { + ret = PTR_ERR(class); + pr_err("Could not create device class (err %d)\n", -ret); + unregister_chrdev_region(hfi1_dev, HFI1_NMINORS); + } + +done: + return ret; +} + +void dev_cleanup(void) +{ + if (class) { + class_destroy(class); + class = NULL; + } + + unregister_chrdev_region(hfi1_dev, HFI1_NMINORS); +} diff --git a/drivers/infiniband/hw/hfi1/device.h b/drivers/infiniband/hw/hfi1/device.h new file mode 100644 index 0000000..98caecd --- /dev/null +++ b/drivers/infiniband/hw/hfi1/device.h @@ -0,0 +1,61 @@ +#ifndef _HFI1_DEVICE_H +#define _HFI1_DEVICE_H +/* + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * Copyright(c) 2015 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * - Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * - Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * - Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + */ + +int hfi1_cdev_init(int minor, const char *name, + const struct file_operations *fops, + struct cdev *cdev, struct device **devp); +void hfi1_cdev_cleanup(struct cdev *cdev, struct device **devp); +const char *class_name(void); +int __init dev_init(void); +void dev_cleanup(void); + +#endif /* _HFI1_DEVICE_H */