diff mbox

[RFC,1/3] RDMA: Add netlink infrastructure

Message ID 1305303525-11113-2-git-send-email-roland@kernel.org (mailing list archive)
State RFC, archived
Headers show

Commit Message

Roland Dreier May 13, 2011, 4:18 p.m. UTC
From: Roland Dreier <roland@purestorage.com>

[Dave please do not apply even if this ends up in netdev patchwork!]

Add basic RDMA netlink infrastructure that allows for registration of
RDMA clients for which data is to be exported and supplies message
construction callbacks.

NOT-Signed-off-by: Nir Muchtar <nirm@voltaire.com>

[ Rename and reorganize a few things.  - Roland ]

NOT-Signed-off-by: Roland Dreier <roland@purestorage.com>
---
 drivers/infiniband/core/Makefile  |    2 +-
 drivers/infiniband/core/device.c  |   11 ++
 drivers/infiniband/core/netlink.c |  190 +++++++++++++++++++++++++++++++++++++
 include/linux/netlink.h           |    1 +
 include/rdma/rdma_netlink.h       |   64 +++++++++++++
 5 files changed, 267 insertions(+), 1 deletions(-)
 create mode 100644 drivers/infiniband/core/netlink.c
 create mode 100644 include/rdma/rdma_netlink.h

Comments

Joe Perches May 13, 2011, 4:44 p.m. UTC | #1
On Fri, 2011-05-13 at 09:18 -0700, Roland Dreier wrote:
> From: Roland Dreier <roland@purestorage.com>
> [Dave please do not apply even if this ends up in netdev patchwork!]

Just trivia:

> diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
[]
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__

Using #define pr_fmt(fmt) KBUILD_MODNAME ":%s: " fmt, __func__
generally produces smaller overall object size, especially
at 64 bit.

For instance, here's the size of netlink.o at 32 bit:

$ size drivers/infiniband/core/netlink.o.*
   text	   data	    bss	    dec	    hex	filename
   2663	    153	    736	   3552	    de0	drivers/infiniband/core/netlink.o.old
   2640	    153	    736	   3529	    dc9	drivers/infiniband/core/netlink.o.new

Also, I rarely find __func__ useful in message output.
It may be more useful for active development/debugging.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 13, 2011, 5:18 p.m. UTC | #2
On Fri, May 13, 2011 at 6:44 PM, Joe Perches <joe@perches.com> wrote:
> On Fri, 2011-05-13 at 09:18 -0700, Roland Dreier wrote:
>> From: Roland Dreier <roland@purestorage.com>
>> [Dave please do not apply even if this ends up in netdev patchwork!]
>
> Just trivia:
>
>> diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
> []
>> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
>
> Using #define pr_fmt(fmt) KBUILD_MODNAME ":%s: " fmt, __func__
> generally produces smaller overall object size, especially
> at 64 bit.
>
> For instance, here's the size of netlink.o at 32 bit:
>
> $ size drivers/infiniband/core/netlink.o.*
>   text    data     bss     dec     hex filename
>   2663     153     736    3552     de0 drivers/infiniband/core/netlink.o.old
>   2640     153     736    3529     dc9 drivers/infiniband/core/netlink.o.new
>
> Also, I rarely find __func__ useful in message output.
> It may be more useful for active development/debugging.

A recent dynamic debug patch made it possible to enable/disable at
runtime whether or not the function name (and more) should be included
in the output. See also http://lwn.net/Articles/434833/ for more
information.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean May 13, 2011, 5:19 p.m. UTC | #3
> +#define NETLINK_INFINIBAND	20

Would NETLINK_RDMA be better?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roland Dreier May 13, 2011, 5:26 p.m. UTC | #4
On Fri, May 13, 2011 at 10:19 AM, Hefty, Sean <sean.hefty@intel.com> wrote:
>> +#define NETLINK_INFINIBAND   20
>
> Would NETLINK_RDMA be better?

Yes, I tried to change all the externally visible examples of IB to RDMA, but I
missed that one (I figure internal kernel APIs can be cleaned up
later).  I'll update my tree.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joe Perches May 13, 2011, 5:36 p.m. UTC | #5
On Fri, 2011-05-13 at 19:18 +0200, Bart Van Assche wrote:
> On Fri, May 13, 2011 at 6:44 PM, Joe Perches <joe@perches.com> wrote:
> > On Fri, 2011-05-13 at 09:18 -0700, Roland Dreier wrote:
> >> From: Roland Dreier <roland@purestorage.com>
> >> [Dave please do not apply even if this ends up in netdev patchwork!]
> >> diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
> > []
> >> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> > Using #define pr_fmt(fmt) KBUILD_MODNAME ":%s: " fmt, __func__
> > generally produces smaller overall object size, especially
> > at 64 bit.
> > For instance, here's the size of netlink.o at 32 bit:
> > $ size drivers/infiniband/core/netlink.o.*
> >   text    data     bss     dec     hex filename
> >   2663     153     736    3552     de0 drivers/infiniband/core/netlink.o.old
> >   2640     153     736    3529     dc9 drivers/infiniband/core/netlink.o.new
> > Also, I rarely find __func__ useful in message output.
> > It may be more useful for active development/debugging.
> A recent dynamic debug patch made it possible to enable/disable at
> runtime whether or not the function name (and more) should be included
> in the output. See also http://lwn.net/Articles/434833/ for more
> information.

One long term goal for me is a generic run-time mechanism
to prefix all pr_<level> uses not just the <foo>_dbg ones
with or without module or function name.

It will also allow the removal of all the uses of
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

New #define pr_fmt lines need to be added to the current
files that use pr_<level> without a prefix so it could
take awhile.

Eventually.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 13, 2011, 6:12 p.m. UTC | #6
On Fri, May 13, 2011 at 7:36 PM, Joe Perches <joe@perches.com> wrote:
> On Fri, 2011-05-13 at 19:18 +0200, Bart Van Assche wrote:
>> A recent dynamic debug patch made it possible to enable/disable at
>> runtime whether or not the function name (and more) should be included
>> in the output. See also http://lwn.net/Articles/434833/ for more
>> information.
>
> One long term goal for me is a generic run-time mechanism
> to prefix all pr_<level> uses not just the <foo>_dbg ones
> with or without module or function name.
>
> It will also allow the removal of all the uses of
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> New #define pr_fmt lines need to be added to the current
> files that use pr_<level> without a prefix so it could
> take awhile.

Something like the +m flag documented in
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/dynamic-debug-howto.txt;hb=HEAD
?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joe Perches May 13, 2011, 6:40 p.m. UTC | #7
On Fri, 2011-05-13 at 20:12 +0200, Bart Van Assche wrote:
> On Fri, May 13, 2011 at 7:36 PM, Joe Perches <joe@perches.com> wrote:
> > On Fri, 2011-05-13 at 19:18 +0200, Bart Van Assche wrote:
> > One long term goal for me is a generic run-time mechanism
> > to prefix all pr_<level> uses not just the <foo>_dbg ones
> > with or without module or function name.
> > It will also allow the removal of all the uses of
> > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > New #define pr_fmt lines need to be added to the current
> > files that use pr_<level> without a prefix so it could
> > take awhile.
> Something like the +m flag documented in
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/dynamic-debug-howto.txt;hb=HEAD
> ?

Yes, akin to that, but using /proc not debugfs.

Basically, just per module on/off prefixing of
module name and optionally function name via a
CONFIG_PR_LEVEL_STORE_FUNCTION_NAME.

I'd like to remove all current specific uses of
__func__ in pr_<level> and use per module
/proc controls to add them back as requested.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index cb1ab3e..c8bbaef 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -8,7 +8,7 @@  obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o \
 					$(user_access-y)
 
 ib_core-y :=			packer.o ud_header.o verbs.o sysfs.o \
-				device.o fmr_pool.o cache.o
+				device.o fmr_pool.o cache.o netlink.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 
 ib_mad-y :=			mad.o smi.o agent.o mad_rmpp.o
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 46a7a3f..635c338 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -38,6 +38,7 @@ 
 #include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/mutex.h>
+#include <rdma/rdma_netlink.h>
 
 #include "core_priv.h"
 
@@ -736,8 +737,17 @@  static int __init ib_core_init(void)
 		goto err_sysfs;
 	}
 
+	ret = ibnl_init();
+	if (ret) {
+		printk(KERN_WARNING "Couldn't init IB netlink interface\n");
+		goto err_cache;
+	}
+
 	return 0;
 
+err_cache:
+	ib_cache_cleanup();
+
 err_sysfs:
 	ib_sysfs_cleanup();
 
@@ -748,6 +758,7 @@  err:
 
 static void __exit ib_core_cleanup(void)
 {
+	ibnl_cleanup();
 	ib_cache_cleanup();
 	ib_sysfs_cleanup();
 	/* Make sure that any pending umem accounting work is done. */
diff --git a/drivers/infiniband/core/netlink.c b/drivers/infiniband/core/netlink.c
new file mode 100644
index 0000000..361e29f
--- /dev/null
+++ b/drivers/infiniband/core/netlink.c
@@ -0,0 +1,190 @@ 
+/*
+ * Copyright (c) 2010 Voltaire Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <net/netlink.h>
+#include <net/net_namespace.h>
+#include <net/sock.h>
+#include <rdma/rdma_netlink.h>
+
+struct ibnl_client {
+	struct list_head		list;
+	int				index;
+	int				nops;
+	const struct ibnl_client_cbs   *cb_table;
+};
+
+static DEFINE_MUTEX(ibnl_mutex);
+static struct sock *nls;
+static LIST_HEAD(client_list);
+
+int ibnl_add_client(int index, int nops,
+		    const struct ibnl_client_cbs cb_table[])
+{
+	struct ibnl_client *cur;
+	struct ibnl_client *nl_client;
+
+	nl_client = kmalloc(sizeof *nl_client, GFP_KERNEL);
+	if (!nl_client)
+		return -ENOMEM;
+
+	nl_client->index	= index;
+	nl_client->nops		= nops;
+	nl_client->cb_table	= cb_table;
+
+	mutex_lock(&ibnl_mutex);
+
+	list_for_each_entry(cur, &client_list, list) {
+		if (cur->index == index) {
+			pr_warn("Client for %d already exists\n", index);
+			mutex_unlock(&ibnl_mutex);
+			kfree(nl_client);
+			return -EINVAL;
+		}
+	}
+
+	list_add_tail(&nl_client->list, &client_list);
+
+	mutex_unlock(&ibnl_mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL(ibnl_add_client);
+
+int ibnl_remove_client(int index)
+{
+	struct ibnl_client *cur, *next;
+
+	mutex_lock(&ibnl_mutex);
+	list_for_each_entry_safe(cur, next, &client_list, list) {
+		if (cur->index == index) {
+			list_del(&(cur->list));
+			mutex_unlock(&ibnl_mutex);
+			kfree(cur);
+			return 0;
+		}
+	}
+	pr_warn("Can't remove callback for client idx %d. Not found\n", index);
+	mutex_unlock(&ibnl_mutex);
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(ibnl_remove_client);
+
+void *ibnl_put_msg(struct sk_buff *skb, struct nlmsghdr **nlh, int seq,
+		   int len, int client, int op)
+{
+	unsigned char *prev_tail;
+
+	prev_tail = skb_tail_pointer(skb);
+	*nlh = NLMSG_NEW(skb, 0, seq, RDMA_NL_GET_TYPE(client, op),
+			len, NLM_F_MULTI);
+	(*nlh)->nlmsg_len = skb_tail_pointer(skb) - prev_tail;
+	return NLMSG_DATA(*nlh);
+
+nlmsg_failure:
+	nlmsg_trim(skb, prev_tail);
+	return NULL;
+}
+EXPORT_SYMBOL(ibnl_put_msg);
+
+int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh,
+		  int len, void *data, int type)
+{
+	unsigned char *prev_tail;
+
+	prev_tail = skb_tail_pointer(skb);
+	NLA_PUT(skb, type, len, data);
+	nlh->nlmsg_len += skb_tail_pointer(skb) - prev_tail;
+	return 0;
+
+nla_put_failure:
+	nlmsg_trim(skb, prev_tail - nlh->nlmsg_len);
+	return -EMSGSIZE;
+}
+EXPORT_SYMBOL(ibnl_put_attr);
+
+static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
+{
+	struct ibnl_client *client;
+	int type = nlh->nlmsg_type;
+	int index = RDMA_NL_GET_CLIENT(type);
+	int op = RDMA_NL_GET_OP(type);
+
+	list_for_each_entry(client, &client_list, list) {
+		if (client->index == index) {
+			if (op < 0 || op >= client->nops ||
+			    !client->cb_table[RDMA_NL_GET_OP(op)].dump)
+				return -EINVAL;
+			return netlink_dump_start(nls, skb, nlh,
+						  client->cb_table[op].dump,
+						  NULL);
+		}
+	}
+
+	pr_info("Index %d wasn't found in client list\n", index);
+	return -EINVAL;
+}
+
+static void ibnl_rcv(struct sk_buff *skb)
+{
+	mutex_lock(&ibnl_mutex);
+	netlink_rcv_skb(skb, &ibnl_rcv_msg);
+	mutex_unlock(&ibnl_mutex);
+}
+
+int ibnl_init(void)
+{
+	nls = netlink_kernel_create(&init_net, NETLINK_INFINIBAND, 0, ibnl_rcv,
+				    NULL, THIS_MODULE);
+	if (!nls) {
+		pr_warn("Failed to create netlink socket\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+void ibnl_cleanup(void)
+{
+	struct ibnl_client *cur, *next;
+
+	mutex_lock(&ibnl_mutex);
+	list_for_each_entry_safe(cur, next, &client_list, list) {
+		list_del(&(cur->list));
+		kfree(cur);
+	}
+	mutex_unlock(&ibnl_mutex);
+
+	netlink_kernel_release(nls);
+}
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 4c4ac3f..b1e3e59 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -24,6 +24,7 @@ 
 /* leave room for NETLINK_DM (DM Events) */
 #define NETLINK_SCSITRANSPORT	18	/* SCSI Transports */
 #define NETLINK_ECRYPTFS	19
+#define NETLINK_INFINIBAND	20
 
 #define MAX_LINKS 32		
 
diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
new file mode 100644
index 0000000..c983a19
--- /dev/null
+++ b/include/rdma/rdma_netlink.h
@@ -0,0 +1,64 @@ 
+#ifndef _RDMA_NETLINK_H
+#define _RDMA_NETLINK_H
+
+#define RDMA_NL_GET_CLIENT(type) ((type & (((1 << 6) - 1) << 10)) >> 10)
+#define RDMA_NL_GET_OP(type) (type & ((1 << 10) - 1))
+#define RDMA_NL_GET_TYPE(client, op) ((client << 10) + op)
+
+#ifdef __KERNEL__
+
+#include <linux/netlink.h>
+
+struct ibnl_client_cbs {
+	int (*dump)(struct sk_buff *skb, struct netlink_callback *nlcb);
+};
+
+int ibnl_init(void);
+void ibnl_cleanup(void);
+
+/**
+ * Add a a client to the list of IB netlink exporters.
+ * @index: Index of the added client
+ * @nops: Number of supported ops by the added client.
+ * @cb_table: A table for op->callback
+ *
+ * Returns 0 on success or a negative error code.
+ */
+int ibnl_add_client(int index, int nops,
+		    const struct ibnl_client_cbs cb_table[]);
+
+/**
+ * Remove a client from IB netlink.
+ * @index: Index of the removed IB client.
+ *
+ * Returns 0 on success or a negative error code.
+ */
+int ibnl_remove_client(int index);
+
+/**
+ * Put a new message in a supplied skb.
+ * @skb: The netlink skb.
+ * @nlh: Pointer to put the header of the new netlink message.
+ * @seq: The message sequence number.
+ * @len: The requested message length to allocate.
+ * @client: Calling IB netlink client.
+ * @op: message content op.
+ * Returns the allocated buffer on success and NULL on failure.
+ */
+void *ibnl_put_msg(struct sk_buff *skb, struct nlmsghdr **nlh, int seq,
+		   int len, int client, int op);
+/**
+ * Put a new attribute in a supplied skb.
+ * @skb: The netlink skb.
+ * @nlh: Header of the netlink message to append the attribute to.
+ * @len: The length of the attribute data.
+ * @data: The attribute data to put.
+ * @type: The attribute type.
+ * Returns the 0 and a negative error code on failure.
+ */
+int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh,
+		  int len, void *data, int type);
+
+#endif /* __KERNEL__ */
+
+#endif /* _RDMA_NETLINK_H */