diff mbox

[v1] libibverbs: Add support for usNIC nodes and transports

Message ID 1390360721-14894-1-git-send-email-umalhi@cisco.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Upinder Malhi (umalhi) Jan. 22, 2014, 3:18 a.m. UTC
5db5765e255de4072eb0e35facfeafce53af001b and 180771a3707a4c0577cbf4f830c754dbabfdfccb
on git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
add IB_NODE_USNIC[_UDP] and IB_TRANSPORT_USNIC[_UDP] support to the kernel.
This patch adds the corresponding support in libibverbs.

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
---
 examples/devinfo.c         | 8 +++++---
 include/infiniband/verbs.h | 8 ++++++--
 src/enum_strs.c            | 6 ++++--
 src/init.c                 | 8 +++++++-
 4 files changed, 22 insertions(+), 8 deletions(-)

--
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hefty, Sean Jan. 22, 2014, 4:04 a.m. UTC | #1
> diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
> index 4b1ab57..ff9a1aa 100644
> --- a/include/infiniband/verbs.h
> +++ b/include/infiniband/verbs.h
> @@ -68,13 +68,17 @@ enum ibv_node_type {
>  	IBV_NODE_CA 		= 1,
>  	IBV_NODE_SWITCH,
>  	IBV_NODE_ROUTER,
> -	IBV_NODE_RNIC
> +	IBV_NODE_RNIC,
> +	IBV_NODE_USNIC,
> +	IBV_NODE_USNIC_UDP,

Are these really different nodes or just different transports for the same NIC?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Upinder Malhi (umalhi) Jan. 22, 2014, 4:57 p.m. UTC | #2
Sean,
	The underlaying requirement is to have two different transports for usNIC.  However, in verbs, there is a one to one relationship between node type and transport.  Hence, there are two nodes also.

Upinder

On Jan 21, 2014, at 8:04 PM, "Hefty, Sean" <sean.hefty@intel.com> wrote:

>> diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
>> index 4b1ab57..ff9a1aa 100644
>> --- a/include/infiniband/verbs.h
>> +++ b/include/infiniband/verbs.h
>> @@ -68,13 +68,17 @@ enum ibv_node_type {
>> 	IBV_NODE_CA 		= 1,
>> 	IBV_NODE_SWITCH,
>> 	IBV_NODE_ROUTER,
>> -	IBV_NODE_RNIC
>> +	IBV_NODE_RNIC,
>> +	IBV_NODE_USNIC,
>> +	IBV_NODE_USNIC_UDP,
> 
> Are these really different nodes or just different transports for the same NIC?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean Jan. 22, 2014, 5:32 p.m. UTC | #3
> 	The underlaying requirement is to have two different transports for
> usNIC.  However, in verbs, there is a one to one relationship between node
> type and transport.  Hence, there are two nodes also.

The CA 'node' type supports multiple transports - UD, UC, RC, etc.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Upinder Malhi (umalhi) Jan. 22, 2014, 5:59 p.m. UTC | #4
Sean,
	AFAIK can see from the code, the CA node type corresponds to the "IBV_TRANSPORT_IB."  The packet format might change slightly depending on QP type, but the *base* transport remains IB.

usNIC supports two distinct transports; custom roce and UDP.  Unless I am missing something, the only way to represent these in the verbs code is by having two separate nodes.

Upinder


On Jan 22, 2014, at 9:32 AM, "Hefty, Sean" <sean.hefty@intel.com>
 wrote:

>> 	The underlaying requirement is to have two different transports for
>> usNIC.  However, in verbs, there is a one to one relationship between node
>> type and transport.  Hence, there are two nodes also.
> 
> The CA 'node' type supports multiple transports - UD, UC, RC, etc.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean Jan. 22, 2014, 6:47 p.m. UTC | #5
> 	AFAIK can see from the code, the CA node type corresponds to the
> "IBV_TRANSPORT_IB."  The packet format might change slightly depending on
> QP type, but the *base* transport remains IB.

You're right from an API perspective.  The actual transport layer is reported by verbs using multiple fields: transport + QP type.
 
> usNIC supports two distinct transports; custom roce and UDP.  Unless I am
> missing something, the only way to represent these in the verbs code is by
> having two separate nodes.

Verbs seems broken.  The node hasn't changed, and I don't see why some version of usNIC couldn't run both transport protocols at once.  Maybe a better way of mapping usNIC to verbs is to map custom RoCE (sounds like code for non-compliant) and UDP to a QP type?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Upinder Malhi (umalhi) Jan. 22, 2014, 7:46 p.m. UTC | #6
On Jan 22, 2014, at 10:47 AM, "Hefty, Sean" <sean.hefty@intel.com> wrote:

>> 	AFAIK can see from the code, the CA node type corresponds to the
>> "IBV_TRANSPORT_IB."  The packet format might change slightly depending on
>> QP type, but the *base* transport remains IB.
> 
> You're right from an API perspective.  The actual transport layer is reported by verbs using multiple fields: transport + QP type.

I don't know what you mean.  The transport is independent of the QP type.
static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
...
	printf("\ttransport:\t\t\t%s (%d)\n",
	       transport_str(ib_dev->transport_type), ib_dev->transport_type);
…
Am I looking in the right place?

> 

>> usNIC supports two distinct transports; custom roce and UDP.  Unless I am
>> missing something, the only way to represent these in the verbs code is by
>> having two separate nodes.
> 
> Verbs seems broken.  The node hasn't changed, and I don't see why some version of usNIC couldn't run both transport protocols at once.  Maybe a better way of mapping usNIC to verbs is to map custom RoCE (sounds like code for non-compliant) and UDP to a QP type?
Yes, this is a verbs limitation.  Sounds like also something to address in your post-verbs proposal :)
Custom RoCE is indeed Cisco specific.  
The QP type is independent of the transport and even if it wasn't, this change complicates the design.  Having unreliable QP type per transport?  Apps would have to be rewritten for the new QP types.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean Jan. 22, 2014, 9:12 p.m. UTC | #7
> >> 	AFAIK can see from the code, the CA node type corresponds to the
> >> "IBV_TRANSPORT_IB."  The packet format might change slightly depending
> on
> >> QP type, but the *base* transport remains IB.
> >
> > You're right from an API perspective.  The actual transport layer is
> reported by verbs using multiple fields: transport + QP type.
> 
> I don't know what you mean.  The transport is independent of the QP type.
> static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
> ...
> 	printf("\ttransport:\t\t\t%s (%d)\n",
> 	       transport_str(ib_dev->transport_type), ib_dev-
> >transport_type);
> ...
> Am I looking in the right place?

I'm simply saying that, for IB, what we call the QP type is actually the transport level protocol.  IIRC, when iwarp was added, we added transport_type to distinguish between QP type RC being IB or iWarp transport.  The result is that the actual transport level protocol is some combination between QP type and transport_type.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Upinder Malhi (umalhi) Jan. 22, 2014, 9:53 p.m. UTC | #8
Understood that the QP type, transport_type combination is implicit.

I do not believe that a new QP type is justified though, for the reason given.  usNIC HCA UD QPs will either all be on custom RoCE transport or all on UDP.  Mixing is not allowed.

Upinder

On Jan 22, 2014, at 1:12 PM, "Hefty, Sean" <sean.hefty@intel.com>
 wrote:

>>>> 	AFAIK can see from the code, the CA node type corresponds to the
>>>> "IBV_TRANSPORT_IB."  The packet format might change slightly depending
>> on
>>>> QP type, but the *base* transport remains IB.
>>> 
>>> You're right from an API perspective.  The actual transport layer is
>> reported by verbs using multiple fields: transport + QP type.
>> 
>> I don't know what you mean.  The transport is independent of the QP type.
>> static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
>> ...
>> 	printf("\ttransport:\t\t\t%s (%d)\n",
>> 	       transport_str(ib_dev->transport_type), ib_dev-
>>> transport_type);
>> ...
>> Am I looking in the right place?
> 
> I'm simply saying that, for IB, what we call the QP type is actually the transport level protocol.  IIRC, when iwarp was added, we added transport_type to distinguish between QP type RC being IB or iWarp transport.  The result is that the actual transport level protocol is some combination between QP type and transport_type.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/examples/devinfo.c b/examples/devinfo.c
index ff078e4..afa8c85 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -70,9 +70,11 @@  static const char *guid_str(uint64_t node_guid, char *str)
 static const char *transport_str(enum ibv_transport_type transport)
 {
 	switch (transport) {
-	case IBV_TRANSPORT_IB:    return "InfiniBand";
-	case IBV_TRANSPORT_IWARP: return "iWARP";
-	default:		  return "invalid transport";
+	case IBV_TRANSPORT_IB:		return "InfiniBand";
+	case IBV_TRANSPORT_IWARP:	return "iWARP";
+	case IBV_TRANSPORT_USNIC:	return "usNIC";
+	case IBV_TRANSPORT_USNIC_UDP:	return "usNIC UDP";
+	default:			return "invalid transport";
 	}
 }

diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index 4b1ab57..ff9a1aa 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -68,13 +68,17 @@  enum ibv_node_type {
 	IBV_NODE_CA 		= 1,
 	IBV_NODE_SWITCH,
 	IBV_NODE_ROUTER,
-	IBV_NODE_RNIC
+	IBV_NODE_RNIC,
+	IBV_NODE_USNIC,
+	IBV_NODE_USNIC_UDP,
 };

 enum ibv_transport_type {
 	IBV_TRANSPORT_UNKNOWN	= -1,
 	IBV_TRANSPORT_IB	= 0,
-	IBV_TRANSPORT_IWARP
+	IBV_TRANSPORT_IWARP,
+	IBV_TRANSPORT_USNIC,
+	IBV_TRANSPORT_USNIC_UDP,
 };

 enum ibv_device_cap_flags {
diff --git a/src/enum_strs.c b/src/enum_strs.c
index 54d71a6..93ffe32 100644
--- a/src/enum_strs.c
+++ b/src/enum_strs.c
@@ -38,10 +38,12 @@  const char *ibv_node_type_str(enum ibv_node_type node_type)
 		[IBV_NODE_CA]		= "InfiniBand channel adapter",
 		[IBV_NODE_SWITCH]	= "InfiniBand switch",
 		[IBV_NODE_ROUTER]	= "InfiniBand router",
-		[IBV_NODE_RNIC]		= "iWARP NIC"
+		[IBV_NODE_RNIC]		= "iWARP NIC",
+		[IBV_NODE_USNIC]	= "usNIC",
+		[IBV_NODE_USNIC_UDP]	= "usNIC UDP",
 	};

-	if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC)
+	if (node_type < IBV_NODE_CA || node_type > IBV_NODE_USNIC_UDP)
 		return "unknown";

 	return node_type_str[node_type];
diff --git a/src/init.c b/src/init.c
index 8e93f3f..1724d81 100644
--- a/src/init.c
+++ b/src/init.c
@@ -346,7 +346,7 @@  static struct ibv_device *try_driver(struct ibv_driver *driver,
 			dev->node_type = IBV_NODE_UNKNOWN;
 	} else {
 		dev->node_type = strtol(value, NULL, 10);
-		if (dev->node_type < IBV_NODE_CA || dev->node_type > IBV_NODE_RNIC)
+		if (dev->node_type < IBV_NODE_CA || dev->node_type > IBV_NODE_USNIC_UDP)
 			dev->node_type = IBV_NODE_UNKNOWN;
 	}

@@ -359,6 +359,12 @@  static struct ibv_device *try_driver(struct ibv_driver *driver,
 	case IBV_NODE_RNIC:
 		dev->transport_type = IBV_TRANSPORT_IWARP;
 		break;
+	case IBV_NODE_USNIC:
+		dev->transport_type = IBV_TRANSPORT_USNIC;
+		break;
+	case IBV_NODE_USNIC_UDP:
+		dev->transport_type = IBV_TRANSPORT_USNIC_UDP;
+		break;
 	default:
 		dev->transport_type = IBV_TRANSPORT_UNKNOWN;
 		break;