diff mbox

[RFI] ucmatose: No effect to set service type for QoS

Message ID CAMGffEmsG7wMnFNP5VWx9imypNRXddaynQT1UU4ia-TTKF+psg@mail.gmail.com (mailing list archive)
State Deferred
Headers show

Commit Message

Jinpu Wang Aug. 12, 2016, 1:10 p.m. UTC
On Fri, Aug 12, 2016 at 3:03 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
> On 8/12/2016 7:55 AM, Hal Rosenstock wrote:
>> On 8/12/2016 4:15 AM, Jinpu Wang wrote:
>>> On Thu, Aug 11, 2016 at 11:15 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>>> On 8/11/2016 8:29 AM, Jinpu Wang wrote:
>>>>> On Wed, Aug 10, 2016 at 8:52 PM, Hal Rosenstock <hal@dev.mellanox.co.il> wrote:
>>>>>> On 8/9/2016 12:26 PM, Jinpu Wang wrote:
>>>>>>> Hi Sean,
>>>>>>>
>>>>>>> I'm testing QoS support for IB. I notice ucmatose has equally
>>>>>>> performance when set different service type, but set SL in ib_send_bw
>>>>>>> works well (different SL show different performance base on opensm
>>>>>>> settings)
>>>>>>>
>>>>>>> I capature packats using ibdump, it shows in in LRH the service level
>>>>>>> fields are all 0 when running traffic with ucmatose.
>>>>>>>
>>>>>>> When running ib_send_bw, it carries the right service level I set.
>>>>>>>
>>>>>>> Seems in rdma_set_service_type, it sets to tos to id_priv->tos, and
>>>>>>> lter set to path_rec->qos_class or traffic_class but not to sl
>>>>>>> directly, what's the consideration here?
>>>>>>> code snip:
>>>>>>>         switch (cma_family(id_priv)) {
>>>>>>>         case AF_INET:
>>>>>>>                 path_rec->qos_class = cpu_to_be16((u16) id_priv->tos);
>>>>>>>                 comp_mask |= IB_SA_PATH_REC_QOS_CLASS;
>>>>>>>                 break;
>>>>>>>         case AF_INET6:
>>>>>>>                 sin6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
>>>>>>>                 path_rec->traffic_class = (u8)
>>>>>>> (be32_to_cpu(sin6->sin6_flowinfo) >> 20);
>>>>>>>                 comp_mask |= IB_SA_PATH_REC_TRAFFIC_CLASS;
>>>>>>>                 break;
>>>>>>>         case AF_IB:
>>>>>>>                 sib = (struct sockaddr_ib *) cma_src_addr(id_priv);
>>>>>>>                 path_rec->traffic_class = (u8)
>>>>>>> (be32_to_cpu(sib->sib_flowinfo) >> 20);
>>>>>>>
>>>>>>>
>>>>>>> Does it make sense we also set sl here, or service type for ucmatose
>>>>>>> is totally different with SL for ib_send_bw?
>>>>>>
>>>>>> I think this is an OpenSM configuration issue. QoS policy needs to be
>>>>>> setup to return the proper SL to use for QoS class or TClass in the
>>>>>> PathRecord response.
>>>>>>
>>>>>> -- Hal
>>>>>>
>>>>> Thanks Hal,
>>>>>
>>>>> Configure extra QoS policy seems quite complex.
>>>>
>>>> Configuration complexity varies depending on the requirements of the QoS
>>>> needs.
>>>>
>>>> Which type of RDMA CM connections are being used (IPv4, IPv6, or native
>>>> IB) ?
>>>>
>>>>> Do you think patch attached make sense?
>>>>
>>>> Attached patch doesn't appear to relate to upstream.
>>>
>>> Indeed, it's based on MLNXOFED 3.2
>>>
>>>>
>>>> It also looks incomplete to me. What invokes rdma_set_service_level ? Is
>>>> it some option in ucma.c:ucma_set_option ?
>>>
>>> The main purpose is for our in house transport kernel module,
>
> One more thing:
>
> How does transport module know which SL to request ?
>
> In general, SL is based on SM configuration.
>
> Service ID and QoS Class or Traffic Class are the "higher level" IB
> architected ways to obtain the SL.
>
>>> it
>>> supports all 3 connections
>>> (IPv4, IPv6, and native IB, IB is the default).
>>
>>>> Current patch doesn't appear to me to be backward compatible. If
>>>> rdma_set_service_level is not called in flow, then SL should not be set
>>>> in SA PR query which is what happens today.
>>>
>>> Good point, I will add check only set SL if not 0,
>>
>> 0 is a valid SL so an extra bit somewhere is needed to indicate whether
>> a specific SL is being requested.
>>
>>> but if
>>> rdma_set_service_level is not called,
>>> SL should be 0 as before, shouldn't change SA PR query behavior, or I
>>> missed something?
>>
>> Component mask for SL in SA PR query is not on currently so that means
>> it's wildcarded rather than 0.
>>
>>>> Also, if SL is set in query, you probably don't need some of the other
>>>> fields that are being set.
>>>>
>>> Do you mean SL shouldn't be set with other fields, what's the side effect there?
>>
>> Never mind. It's probably best to leave those other fields as is.
>>

Thanks, I update my patch to address your comments.

For configuration, we introduce a module parameter, sysadmin will set that.
similar for ib_send_bw kinds of perf tools.
diff mbox

Patch

From 385edc9d217b8175e1c55b52302571b1d21d8d71 Mon Sep 17 00:00:00 2001
From: Jack Wang <jinpu.wang@profitbricks.com>
Date: Wed, 10 Aug 2016 10:50:53 +0200
Subject: [PATCH] cma: export function to set service level

We want this for isolating network traffic with storage traffic.

So extend cma to allow us to do it for QoS, to keep the old bahavior,
only apply mask when sl_on is set.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 drivers/infiniband/core/cma.c | 17 +++++++++++++++++
 include/rdma/rdma_cm.h        | 13 +++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 66e8516..4b4d453 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -225,6 +225,8 @@  struct rdma_id_private {
 	u32			options;
 	u8			srq;
 	u8			tos;
+	u8			sl;
+	bool			sl_on;
 	u8			reuseaddr;
 	u8			afonly;
 	enum ib_gid_type	gid_type;
@@ -2752,6 +2754,17 @@  static void cma_listen_on_all(struct rdma_id_private *id_priv)
 	mutex_unlock(&lock);
 }
 
+void rdma_set_service_level(struct rdma_cm_id *id, u8 sl)
+{
+	struct rdma_id_private *id_priv;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	id_priv->sl = sl;
+	id_priv->sl_on = true;
+}
+EXPORT_SYMBOL(rdma_set_service_level);
+
+
 void rdma_set_service_type(struct rdma_cm_id *id, int tos)
 {
 	struct rdma_id_private *id_priv;
@@ -2838,6 +2851,10 @@  static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms,
 	path_rec->pkey = cpu_to_be16(ib_addr_get_pkey(&addr->dev_addr));
 	path_rec->numb_path = 1;
 	path_rec->reversible = 1;
+	if (id_priv->sl_on) {
+		path_rec->sl = id_priv->sl;
+		comp_mask |= IB_SA_PATH_REC_SL;
+	}
 	path_rec->service_id = rdma_get_service_id(&id_priv->id, cma_dst_addr(id_priv));
 
 	comp_mask |= IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
index b34ee4e..df7030e 100644
--- a/include/rdma/rdma_cm.h
+++ b/include/rdma/rdma_cm.h
@@ -374,6 +374,19 @@  int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
 void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr);
 
 /**
+ * rdma_set_service_level - Set the level of service associated with a
+ *   connection identifier.
+ * @id: Communication identifier to associated with service type.
+ * @sl: service level.
+ *
+ * The service level should be specified before
+ * performing route resolution, as existing communication on the
+ * connection identifier may be unaffected.  The level of service
+ * requested may not be supported by the network to all destinations.
+ */
+void rdma_set_service_level(struct rdma_cm_id *id, u8 sl);
+
+/**
  * rdma_set_service_type - Set the type of service associated with a
  *   connection identifier.
  * @id: Communication identifier to associated with service type.
-- 
2.7.4