diff mbox

[RFI] ucmatose: No effect to set service type for QoS

Message ID CAMGffEmP_ouVu5v1cGGENqD9dssgtAqTg39-xa6--NaUkf1aVw@mail.gmail.com (mailing list archive)
State Deferred
Headers show

Commit Message

Jinpu Wang Aug. 11, 2016, 12:29 p.m. UTC
On Wed, Aug 10, 2016 at 8:52 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
> On 8/9/2016 12:26 PM, Jinpu Wang wrote:
>> Hi Sean,
>>
>> I'm testing QoS support for IB. I notice ucmatose has equally
>> performance when set different service type, but set SL in ib_send_bw
>> works well (different SL show different performance base on opensm
>> settings)
>>
>> I capature packats using ibdump, it shows in in LRH the service level
>> fields are all 0 when running traffic with ucmatose.
>>
>> When running ib_send_bw, it carries the right service level I set.
>>
>> Seems in rdma_set_service_type, it sets to tos to id_priv->tos, and
>> lter set to path_rec->qos_class or traffic_class but not to sl
>> directly, what's the consideration here?
>> code snip:
>>         switch (cma_family(id_priv)) {
>>         case AF_INET:
>>                 path_rec->qos_class = cpu_to_be16((u16) id_priv->tos);
>>                 comp_mask |= IB_SA_PATH_REC_QOS_CLASS;
>>                 break;
>>         case AF_INET6:
>>                 sin6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
>>                 path_rec->traffic_class = (u8)
>> (be32_to_cpu(sin6->sin6_flowinfo) >> 20);
>>                 comp_mask |= IB_SA_PATH_REC_TRAFFIC_CLASS;
>>                 break;
>>         case AF_IB:
>>                 sib = (struct sockaddr_ib *) cma_src_addr(id_priv);
>>                 path_rec->traffic_class = (u8)
>> (be32_to_cpu(sib->sib_flowinfo) >> 20);
>>
>>
>> Does it make sense we also set sl here, or service type for ucmatose
>> is totally different with SL for ib_send_bw?
>
> I think this is an OpenSM configuration issue. QoS policy needs to be
> setup to return the proper SL to use for QoS class or TClass in the
> PathRecord response.
>
> -- Hal
>
Thanks Hal,

Configure extra QoS policy seems quite complex.
Do you think patch attached make sense?

Comments

Hal Rosenstock Aug. 11, 2016, 9:15 p.m. UTC | #1
On 8/11/2016 8:29 AM, Jinpu Wang wrote:
> On Wed, Aug 10, 2016 at 8:52 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>> On 8/9/2016 12:26 PM, Jinpu Wang wrote:
>>> Hi Sean,
>>>
>>> I'm testing QoS support for IB. I notice ucmatose has equally
>>> performance when set different service type, but set SL in ib_send_bw
>>> works well (different SL show different performance base on opensm
>>> settings)
>>>
>>> I capature packats using ibdump, it shows in in LRH the service level
>>> fields are all 0 when running traffic with ucmatose.
>>>
>>> When running ib_send_bw, it carries the right service level I set.
>>>
>>> Seems in rdma_set_service_type, it sets to tos to id_priv->tos, and
>>> lter set to path_rec->qos_class or traffic_class but not to sl
>>> directly, what's the consideration here?
>>> code snip:
>>>         switch (cma_family(id_priv)) {
>>>         case AF_INET:
>>>                 path_rec->qos_class = cpu_to_be16((u16) id_priv->tos);
>>>                 comp_mask |= IB_SA_PATH_REC_QOS_CLASS;
>>>                 break;
>>>         case AF_INET6:
>>>                 sin6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
>>>                 path_rec->traffic_class = (u8)
>>> (be32_to_cpu(sin6->sin6_flowinfo) >> 20);
>>>                 comp_mask |= IB_SA_PATH_REC_TRAFFIC_CLASS;
>>>                 break;
>>>         case AF_IB:
>>>                 sib = (struct sockaddr_ib *) cma_src_addr(id_priv);
>>>                 path_rec->traffic_class = (u8)
>>> (be32_to_cpu(sib->sib_flowinfo) >> 20);
>>>
>>>
>>> Does it make sense we also set sl here, or service type for ucmatose
>>> is totally different with SL for ib_send_bw?
>>
>> I think this is an OpenSM configuration issue. QoS policy needs to be
>> setup to return the proper SL to use for QoS class or TClass in the
>> PathRecord response.
>>
>> -- Hal
>>
> Thanks Hal,
> 
> Configure extra QoS policy seems quite complex.

Configuration complexity varies depending on the requirements of the QoS
needs.

Which type of RDMA CM connections are being used (IPv4, IPv6, or native
IB) ?

> Do you think patch attached make sense?

Attached patch doesn't appear to relate to upstream.

It also looks incomplete to me. What invokes rdma_set_service_level ? Is
it some option in ucma.c:ucma_set_option ?

Current patch doesn't appear to me to be backward compatible. If
rdma_set_service_level is not called in flow, then SL should not be set
in SA PR query which is what happens today.

Also, if SL is set in query, you probably don't need some of the other
fields that are being set.

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jinpu Wang Aug. 12, 2016, 8:15 a.m. UTC | #2
On Thu, Aug 11, 2016 at 11:15 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
> On 8/11/2016 8:29 AM, Jinpu Wang wrote:
>> On Wed, Aug 10, 2016 at 8:52 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>> On 8/9/2016 12:26 PM, Jinpu Wang wrote:
>>>> Hi Sean,
>>>>
>>>> I'm testing QoS support for IB. I notice ucmatose has equally
>>>> performance when set different service type, but set SL in ib_send_bw
>>>> works well (different SL show different performance base on opensm
>>>> settings)
>>>>
>>>> I capature packats using ibdump, it shows in in LRH the service level
>>>> fields are all 0 when running traffic with ucmatose.
>>>>
>>>> When running ib_send_bw, it carries the right service level I set.
>>>>
>>>> Seems in rdma_set_service_type, it sets to tos to id_priv->tos, and
>>>> lter set to path_rec->qos_class or traffic_class but not to sl
>>>> directly, what's the consideration here?
>>>> code snip:
>>>>         switch (cma_family(id_priv)) {
>>>>         case AF_INET:
>>>>                 path_rec->qos_class = cpu_to_be16((u16) id_priv->tos);
>>>>                 comp_mask |= IB_SA_PATH_REC_QOS_CLASS;
>>>>                 break;
>>>>         case AF_INET6:
>>>>                 sin6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
>>>>                 path_rec->traffic_class = (u8)
>>>> (be32_to_cpu(sin6->sin6_flowinfo) >> 20);
>>>>                 comp_mask |= IB_SA_PATH_REC_TRAFFIC_CLASS;
>>>>                 break;
>>>>         case AF_IB:
>>>>                 sib = (struct sockaddr_ib *) cma_src_addr(id_priv);
>>>>                 path_rec->traffic_class = (u8)
>>>> (be32_to_cpu(sib->sib_flowinfo) >> 20);
>>>>
>>>>
>>>> Does it make sense we also set sl here, or service type for ucmatose
>>>> is totally different with SL for ib_send_bw?
>>>
>>> I think this is an OpenSM configuration issue. QoS policy needs to be
>>> setup to return the proper SL to use for QoS class or TClass in the
>>> PathRecord response.
>>>
>>> -- Hal
>>>
>> Thanks Hal,
>>
>> Configure extra QoS policy seems quite complex.
>
> Configuration complexity varies depending on the requirements of the QoS
> needs.
>
> Which type of RDMA CM connections are being used (IPv4, IPv6, or native
> IB) ?
>
>> Do you think patch attached make sense?
>
> Attached patch doesn't appear to relate to upstream.

Indeed, it's based on MLNXOFED 3.2

>
> It also looks incomplete to me. What invokes rdma_set_service_level ? Is
> it some option in ucma.c:ucma_set_option ?

The main purpose is for our in house transport kernel module, it
supports all 3 connections
(IPv4, IPv6, and native IB, IB is the default).

>
> Current patch doesn't appear to me to be backward compatible. If
> rdma_set_service_level is not called in flow, then SL should not be set
> in SA PR query which is what happens today.

Good point, I will add check only set SL if not 0, but if
rdma_set_service_level is not called,
SL should be 0 as before, shouldn't change SA PR query behavior, or I
missed something?

>
> Also, if SL is set in query, you probably don't need some of the other
> fields that are being set.
>
Do you mean SL shouldn't be set with other fields, what's the side effect there?




> -- Hal
Thanks
Hal Rosenstock Aug. 12, 2016, 11:55 a.m. UTC | #3
On 8/12/2016 4:15 AM, Jinpu Wang wrote:
> On Thu, Aug 11, 2016 at 11:15 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>> On 8/11/2016 8:29 AM, Jinpu Wang wrote:
>>> On Wed, Aug 10, 2016 at 8:52 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>>> On 8/9/2016 12:26 PM, Jinpu Wang wrote:
>>>>> Hi Sean,
>>>>>
>>>>> I'm testing QoS support for IB. I notice ucmatose has equally
>>>>> performance when set different service type, but set SL in ib_send_bw
>>>>> works well (different SL show different performance base on opensm
>>>>> settings)
>>>>>
>>>>> I capature packats using ibdump, it shows in in LRH the service level
>>>>> fields are all 0 when running traffic with ucmatose.
>>>>>
>>>>> When running ib_send_bw, it carries the right service level I set.
>>>>>
>>>>> Seems in rdma_set_service_type, it sets to tos to id_priv->tos, and
>>>>> lter set to path_rec->qos_class or traffic_class but not to sl
>>>>> directly, what's the consideration here?
>>>>> code snip:
>>>>>         switch (cma_family(id_priv)) {
>>>>>         case AF_INET:
>>>>>                 path_rec->qos_class = cpu_to_be16((u16) id_priv->tos);
>>>>>                 comp_mask |= IB_SA_PATH_REC_QOS_CLASS;
>>>>>                 break;
>>>>>         case AF_INET6:
>>>>>                 sin6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
>>>>>                 path_rec->traffic_class = (u8)
>>>>> (be32_to_cpu(sin6->sin6_flowinfo) >> 20);
>>>>>                 comp_mask |= IB_SA_PATH_REC_TRAFFIC_CLASS;
>>>>>                 break;
>>>>>         case AF_IB:
>>>>>                 sib = (struct sockaddr_ib *) cma_src_addr(id_priv);
>>>>>                 path_rec->traffic_class = (u8)
>>>>> (be32_to_cpu(sib->sib_flowinfo) >> 20);
>>>>>
>>>>>
>>>>> Does it make sense we also set sl here, or service type for ucmatose
>>>>> is totally different with SL for ib_send_bw?
>>>>
>>>> I think this is an OpenSM configuration issue. QoS policy needs to be
>>>> setup to return the proper SL to use for QoS class or TClass in the
>>>> PathRecord response.
>>>>
>>>> -- Hal
>>>>
>>> Thanks Hal,
>>>
>>> Configure extra QoS policy seems quite complex.
>>
>> Configuration complexity varies depending on the requirements of the QoS
>> needs.
>>
>> Which type of RDMA CM connections are being used (IPv4, IPv6, or native
>> IB) ?
>>
>>> Do you think patch attached make sense?
>>
>> Attached patch doesn't appear to relate to upstream.
> 
> Indeed, it's based on MLNXOFED 3.2
> 
>>
>> It also looks incomplete to me. What invokes rdma_set_service_level ? Is
>> it some option in ucma.c:ucma_set_option ?
> 
> The main purpose is for our in house transport kernel module, it
> supports all 3 connections
> (IPv4, IPv6, and native IB, IB is the default).

>> Current patch doesn't appear to me to be backward compatible. If
>> rdma_set_service_level is not called in flow, then SL should not be set
>> in SA PR query which is what happens today.
> 
> Good point, I will add check only set SL if not 0,

0 is a valid SL so an extra bit somewhere is needed to indicate whether
a specific SL is being requested.

> but if
> rdma_set_service_level is not called,
> SL should be 0 as before, shouldn't change SA PR query behavior, or I
> missed something?

Component mask for SL in SA PR query is not on currently so that means
it's wildcarded rather than 0.

>> Also, if SL is set in query, you probably don't need some of the other
>> fields that are being set.
>>
> Do you mean SL shouldn't be set with other fields, what's the side effect there?

Never mind. It's probably best to leave those other fields as is.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hal Rosenstock Aug. 12, 2016, 1:03 p.m. UTC | #4
On 8/12/2016 7:55 AM, Hal Rosenstock wrote:
> On 8/12/2016 4:15 AM, Jinpu Wang wrote:
>> On Thu, Aug 11, 2016 at 11:15 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>> On 8/11/2016 8:29 AM, Jinpu Wang wrote:
>>>> On Wed, Aug 10, 2016 at 8:52 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>>>> On 8/9/2016 12:26 PM, Jinpu Wang wrote:
>>>>>> Hi Sean,
>>>>>>
>>>>>> I'm testing QoS support for IB. I notice ucmatose has equally
>>>>>> performance when set different service type, but set SL in ib_send_bw
>>>>>> works well (different SL show different performance base on opensm
>>>>>> settings)
>>>>>>
>>>>>> I capature packats using ibdump, it shows in in LRH the service level
>>>>>> fields are all 0 when running traffic with ucmatose.
>>>>>>
>>>>>> When running ib_send_bw, it carries the right service level I set.
>>>>>>
>>>>>> Seems in rdma_set_service_type, it sets to tos to id_priv->tos, and
>>>>>> lter set to path_rec->qos_class or traffic_class but not to sl
>>>>>> directly, what's the consideration here?
>>>>>> code snip:
>>>>>>         switch (cma_family(id_priv)) {
>>>>>>         case AF_INET:
>>>>>>                 path_rec->qos_class = cpu_to_be16((u16) id_priv->tos);
>>>>>>                 comp_mask |= IB_SA_PATH_REC_QOS_CLASS;
>>>>>>                 break;
>>>>>>         case AF_INET6:
>>>>>>                 sin6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
>>>>>>                 path_rec->traffic_class = (u8)
>>>>>> (be32_to_cpu(sin6->sin6_flowinfo) >> 20);
>>>>>>                 comp_mask |= IB_SA_PATH_REC_TRAFFIC_CLASS;
>>>>>>                 break;
>>>>>>         case AF_IB:
>>>>>>                 sib = (struct sockaddr_ib *) cma_src_addr(id_priv);
>>>>>>                 path_rec->traffic_class = (u8)
>>>>>> (be32_to_cpu(sib->sib_flowinfo) >> 20);
>>>>>>
>>>>>>
>>>>>> Does it make sense we also set sl here, or service type for ucmatose
>>>>>> is totally different with SL for ib_send_bw?
>>>>>
>>>>> I think this is an OpenSM configuration issue. QoS policy needs to be
>>>>> setup to return the proper SL to use for QoS class or TClass in the
>>>>> PathRecord response.
>>>>>
>>>>> -- Hal
>>>>>
>>>> Thanks Hal,
>>>>
>>>> Configure extra QoS policy seems quite complex.
>>>
>>> Configuration complexity varies depending on the requirements of the QoS
>>> needs.
>>>
>>> Which type of RDMA CM connections are being used (IPv4, IPv6, or native
>>> IB) ?
>>>
>>>> Do you think patch attached make sense?
>>>
>>> Attached patch doesn't appear to relate to upstream.
>>
>> Indeed, it's based on MLNXOFED 3.2
>>
>>>
>>> It also looks incomplete to me. What invokes rdma_set_service_level ? Is
>>> it some option in ucma.c:ucma_set_option ?
>>
>> The main purpose is for our in house transport kernel module, 

One more thing:

How does transport module know which SL to request ?

In general, SL is based on SM configuration.

Service ID and QoS Class or Traffic Class are the "higher level" IB
architected ways to obtain the SL.

>> it
>> supports all 3 connections
>> (IPv4, IPv6, and native IB, IB is the default).
> 
>>> Current patch doesn't appear to me to be backward compatible. If
>>> rdma_set_service_level is not called in flow, then SL should not be set
>>> in SA PR query which is what happens today.
>>
>> Good point, I will add check only set SL if not 0,
> 
> 0 is a valid SL so an extra bit somewhere is needed to indicate whether
> a specific SL is being requested.
> 
>> but if
>> rdma_set_service_level is not called,
>> SL should be 0 as before, shouldn't change SA PR query behavior, or I
>> missed something?
> 
> Component mask for SL in SA PR query is not on currently so that means
> it's wildcarded rather than 0.
> 
>>> Also, if SL is set in query, you probably don't need some of the other
>>> fields that are being set.
>>>
>> Do you mean SL shouldn't be set with other fields, what's the side effect there?
> 
> Never mind. It's probably best to leave those other fields as is.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny Aug. 17, 2016, 8:29 p.m. UTC | #5
On Fri, Aug 12, 2016 at 09:03:55AM -0400, Hal Rosenstock wrote:

[snip]

> >>>
> >>> It also looks incomplete to me. What invokes rdma_set_service_level ? Is
> >>> it some option in ucma.c:ucma_set_option ?
> >>
> >> The main purpose is for our in house transport kernel module, 
> 
> One more thing:
> 
> How does transport module know which SL to request ?
> 
> In general, SL is based on SM configuration.
> 
> Service ID and QoS Class or Traffic Class are the "higher level" IB
> architected ways to obtain the SL.
>

I agree with Hal here.  ServiceID should be able to define which Path Records
(hence which SLs) are configured by the SM.  In rdmacm applications Service ID
is defined in the IBTA spec

     A11.5 IP PROTOCOL PORTS MAPPING INTO IBTA SERVICE IDS

So it seems if you configure the fabric to be in alignment with this mapping
all should work.

Ira

> 
> >> it
> >> supports all 3 connections
> >> (IPv4, IPv6, and native IB, IB is the default).
> > 
> >>> Current patch doesn't appear to me to be backward compatible. If
> >>> rdma_set_service_level is not called in flow, then SL should not be set
> >>> in SA PR query which is what happens today.
> >>
> >> Good point, I will add check only set SL if not 0,
> > 
> > 0 is a valid SL so an extra bit somewhere is needed to indicate whether
> > a specific SL is being requested.
> > 
> >> but if
> >> rdma_set_service_level is not called,
> >> SL should be 0 as before, shouldn't change SA PR query behavior, or I
> >> missed something?
> > 
> > Component mask for SL in SA PR query is not on currently so that means
> > it's wildcarded rather than 0.
> > 
> >>> Also, if SL is set in query, you probably don't need some of the other
> >>> fields that are being set.
> >>>
> >> Do you mean SL shouldn't be set with other fields, what's the side effect there?
> > 
> > Never mind. It's probably best to leave those other fields as is.
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From beec3a2fc25d37d4f735c55231f4e0fbe3f180ac Mon Sep 17 00:00:00 2001
From: Jack Wang <jinpu.wang@profitbricks.com>
Date: Wed, 10 Aug 2016 10:50:53 +0200
Subject: [PATCH] cma: export function to set service level

We want this for isolating network traffic with storage traffic.

So extend cma to allow us to do it for QoS.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/infiniband/core/cma.c | 14 +++++++++++++-
 include/rdma/rdma_cm.h        | 13 +++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 66e8516..c464aa7 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -225,6 +225,7 @@  struct rdma_id_private {
 	u32			options;
 	u8			srq;
 	u8			tos;
+	u8			sl;
 	u8			reuseaddr;
 	u8			afonly;
 	enum ib_gid_type	gid_type;
@@ -2752,6 +2753,16 @@  static void cma_listen_on_all(struct rdma_id_private *id_priv)
 	mutex_unlock(&lock);
 }
 
+void rdma_set_service_level(struct rdma_cm_id *id, u8 sl)
+{
+	struct rdma_id_private *id_priv;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	id_priv->sl = sl;
+}
+EXPORT_SYMBOL(rdma_set_service_level);
+
+
 void rdma_set_service_type(struct rdma_cm_id *id, int tos)
 {
 	struct rdma_id_private *id_priv;
@@ -2838,9 +2849,10 @@  static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms,
 	path_rec->pkey = cpu_to_be16(ib_addr_get_pkey(&addr->dev_addr));
 	path_rec->numb_path = 1;
 	path_rec->reversible = 1;
+	path_rec->sl = id_priv->sl;
 	path_rec->service_id = rdma_get_service_id(&id_priv->id, cma_dst_addr(id_priv));
 
-	comp_mask |= IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
+	comp_mask |= IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH | IB_SA_PATH_REC_SL |
 		    IB_SA_PATH_REC_REVERSIBLE | IB_SA_PATH_REC_SERVICE_ID;
 
 	switch (cma_family(id_priv)) {
diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
index b34ee4e..df7030e 100644
--- a/include/rdma/rdma_cm.h
+++ b/include/rdma/rdma_cm.h
@@ -374,6 +374,19 @@  int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
 void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr);
 
 /**
+ * rdma_set_service_level - Set the level of service associated with a
+ *   connection identifier.
+ * @id: Communication identifier to associated with service type.
+ * @sl: service level.
+ *
+ * The service level should be specified before
+ * performing route resolution, as existing communication on the
+ * connection identifier may be unaffected.  The level of service
+ * requested may not be supported by the network to all destinations.
+ */
+void rdma_set_service_level(struct rdma_cm_id *id, u8 sl);
+
+/**
  * rdma_set_service_type - Set the type of service associated with a
  *   connection identifier.
  * @id: Communication identifier to associated with service type.
-- 
2.7.4