diff mbox

[rdma-next,V1,2/4] IB/core: Support rate limit for packet pacing

Message ID 1480592596-20126-3-git-send-email-leon@kernel.org (mailing list archive)
State Accepted
Headers show

Commit Message

Leon Romanovsky Dec. 1, 2016, 11:43 a.m. UTC
From: Bodong Wang <bodong@mellanox.com>

Add new member rate_limit to ib_qp_attr which holds the packet pacing rate
in kbps, 0 means unlimited.

IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW
QPs when changing QP state from RTR to RTS, RTS to RTS.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
---
 drivers/infiniband/core/verbs.c | 2 ++
 include/rdma/ib_verbs.h         | 2 ++
 2 files changed, 4 insertions(+)

Comments

Hefty, Sean Dec. 1, 2016, 7:40 p.m. UTC | #1
>  enum ib_qp_state {
> @@ -1151,6 +1152,7 @@ struct ib_qp_attr {
>  	u8			rnr_retry;
>  	u8			alt_port_num;
>  	u8			alt_timeout;
> +	u32			rate_limit;
>  };

And I still disagree with this approach, as there is already an existing field in the API that limits rate.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Dec. 2, 2016, 2:39 p.m. UTC | #2
On Thu, Dec 01, 2016 at 07:40:44PM +0000, Hefty, Sean wrote:
> >  enum ib_qp_state {
> > @@ -1151,6 +1152,7 @@ struct ib_qp_attr {
> >  	u8			rnr_retry;
> >  	u8			alt_port_num;
> >  	u8			alt_timeout;
> > +	u32			rate_limit;
> >  };
>
> And I still disagree with this approach, as there is already an existing field in the API that limits rate.

Hi Sean,

We would like to elaborate more on the subject, and we need your help
here. Our cover letter [1] has description why the existing field is
not enough. Can you write a little bit more why you didn't like
the proposed approach to an existing and real problem?

Thanks

[1] https://www.spinics.net/lists/linux-rdma/msg43585.html
Liran Liss Dec. 4, 2016, 11:53 a.m. UTC | #3
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Leon Romanovsky

> 
> On Thu, Dec 01, 2016 at 07:40:44PM +0000, Hefty, Sean wrote:
> > >  enum ib_qp_state {
> > > @@ -1151,6 +1152,7 @@ struct ib_qp_attr {
> > >  	u8			rnr_retry;
> > >  	u8			alt_port_num;
> > >  	u8			alt_timeout;
> > > +	u32			rate_limit;
> > >  };
> >
> > And I still disagree with this approach, as there is already an existing field in the
> API that limits rate.
> 
> Hi Sean,
> 
> We would like to elaborate more on the subject, and we need your help here.
> Our cover letter [1] has description why the existing field is not enough. Can you
> write a little bit more why you didn't like the proposed approach to an existing
> and real problem?
> 

Clearly, we need a new field that is greater than 8 bits that uses different encoding (actual rate rather than a predefined enumeration of values).

However, this is not only a new field - this rate is also conceptually different than AH static rate:
1) It is not related to the fabric path.
2) It is determined by the application rather than the fabric.
3) It needs to apply to QPs that don't use AH (raw Ethernet QPs)

Of course, we might require raw Ethernet QPs to use an AH just for the sake of rate-limiting.
But 1) + 2) indicate that this should be a concept orthogonal to network paths. In fact, it could apply to RC and UD QPs as well.
In addition, if we implement full connection establishment in the kernel (i.e., the static_rate field is filled in the kernel), we would still like to allow applications to further restrict their rate using a different component mask.
--Liran

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Dec. 14, 2016, 7:03 p.m. UTC | #4
On 12/4/2016 6:53 AM, Liran Liss wrote:
>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>> owner@vger.kernel.org] On Behalf Of Leon Romanovsky
> 
>>
>> On Thu, Dec 01, 2016 at 07:40:44PM +0000, Hefty, Sean wrote:
>>>>  enum ib_qp_state {
>>>> @@ -1151,6 +1152,7 @@ struct ib_qp_attr {
>>>>  	u8			rnr_retry;
>>>>  	u8			alt_port_num;
>>>>  	u8			alt_timeout;
>>>> +	u32			rate_limit;
>>>>  };
>>>
>>> And I still disagree with this approach, as there is already an existing field in the
>> API that limits rate.
>>
>> Hi Sean,
>>
>> We would like to elaborate more on the subject, and we need your help here.
>> Our cover letter [1] has description why the existing field is not enough. Can you
>> write a little bit more why you didn't like the proposed approach to an existing
>> and real problem?
>>
> 
> Clearly, we need a new field that is greater than 8 bits that uses different encoding (actual rate rather than a predefined enumeration of values).
> 
> However, this is not only a new field - this rate is also conceptually different than AH static rate:
> 1) It is not related to the fabric path.
> 2) It is determined by the application rather than the fabric.
> 3) It needs to apply to QPs that don't use AH (raw Ethernet QPs)
> 
> Of course, we might require raw Ethernet QPs to use an AH just for the sake of rate-limiting.
> But 1) + 2) indicate that this should be a concept orthogonal to network paths. In fact, it could apply to RC and UD QPs as well.
> In addition, if we implement full connection establishment in the kernel (i.e., the static_rate field is filled in the kernel), we would still like to allow applications to further restrict their rate using a different component mask.
> --Liran
> 

I applied this series.  I agree with Liran that the need for something
other than an enum to denote rates is needed.  We will have to work out
exactly how to present this to user space (keep both, replace
static_rate and bumb the verbs API number, other options?)
Ira Weiny Dec. 16, 2016, 2:48 a.m. UTC | #5
On Wed, Dec 14, 2016 at 02:03:14PM -0500, Doug Ledford wrote:
> On 12/4/2016 6:53 AM, Liran Liss wrote:
> >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> >> owner@vger.kernel.org] On Behalf Of Leon Romanovsky
> > 
> >>
> >> On Thu, Dec 01, 2016 at 07:40:44PM +0000, Hefty, Sean wrote:
> >>>>  enum ib_qp_state {
> >>>> @@ -1151,6 +1152,7 @@ struct ib_qp_attr {
> >>>>  	u8			rnr_retry;
> >>>>  	u8			alt_port_num;
> >>>>  	u8			alt_timeout;
> >>>> +	u32			rate_limit;
> >>>>  };
> >>>
> >>> And I still disagree with this approach, as there is already an existing field in the
> >> API that limits rate.
> >>
> >> Hi Sean,
> >>
> >> We would like to elaborate more on the subject, and we need your help here.
> >> Our cover letter [1] has description why the existing field is not enough. Can you
> >> write a little bit more why you didn't like the proposed approach to an existing
> >> and real problem?
> >>
> > 
> > Clearly, we need a new field that is greater than 8 bits that uses different encoding (actual rate rather than a predefined enumeration of values).
> > 
> > However, this is not only a new field - this rate is also conceptually different than AH static rate:
> > 1) It is not related to the fabric path.
> > 2) It is determined by the application rather than the fabric.
> > 3) It needs to apply to QPs that don't use AH (raw Ethernet QPs)
> > 
> > Of course, we might require raw Ethernet QPs to use an AH just for the sake of rate-limiting.
> > But 1) + 2) indicate that this should be a concept orthogonal to network paths. In fact, it could apply to RC and UD QPs as well.
> > In addition, if we implement full connection establishment in the kernel (i.e., the static_rate field is filled in the kernel), we would still like to allow applications to further restrict their rate using a different component mask.
> > --Liran
> > 
> 
> I applied this series.  I agree with Liran that the need for something
> other than an enum to denote rates is needed.  We will have to work out
> exactly how to present this to user space (keep both, replace
> static_rate and bumb the verbs API number, other options?)

I think in the new ioctl interface values like this should be more generic.
MTU is another example.

FWIW I agree with Sean that static rate should have been changed to accommodate
this expanded meaning.

With this new structure definition which rate is in effect?  Are devices
supposed to take the min of the 2?

Ira

> 
> -- 
> Doug Ledford <dledford@redhat.com>
>     GPG Key ID: 0E572FDD
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Dec. 16, 2016, 3:53 a.m. UTC | #6
On 12/15/2016 9:48 PM, ira.weiny wrote:
> On Wed, Dec 14, 2016 at 02:03:14PM -0500, Doug Ledford wrote:
>> On 12/4/2016 6:53 AM, Liran Liss wrote:
>>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>>>> owner@vger.kernel.org] On Behalf Of Leon Romanovsky
>>>
>>>>
>>>> On Thu, Dec 01, 2016 at 07:40:44PM +0000, Hefty, Sean wrote:
>>>>>>  enum ib_qp_state {
>>>>>> @@ -1151,6 +1152,7 @@ struct ib_qp_attr {
>>>>>>  	u8			rnr_retry;
>>>>>>  	u8			alt_port_num;
>>>>>>  	u8			alt_timeout;
>>>>>> +	u32			rate_limit;
>>>>>>  };
>>>>>
>>>>> And I still disagree with this approach, as there is already an existing field in the
>>>> API that limits rate.
>>>>
>>>> Hi Sean,
>>>>
>>>> We would like to elaborate more on the subject, and we need your help here.
>>>> Our cover letter [1] has description why the existing field is not enough. Can you
>>>> write a little bit more why you didn't like the proposed approach to an existing
>>>> and real problem?
>>>>
>>>
>>> Clearly, we need a new field that is greater than 8 bits that uses different encoding (actual rate rather than a predefined enumeration of values).
>>>
>>> However, this is not only a new field - this rate is also conceptually different than AH static rate:
>>> 1) It is not related to the fabric path.
>>> 2) It is determined by the application rather than the fabric.
>>> 3) It needs to apply to QPs that don't use AH (raw Ethernet QPs)
>>>
>>> Of course, we might require raw Ethernet QPs to use an AH just for the sake of rate-limiting.
>>> But 1) + 2) indicate that this should be a concept orthogonal to network paths. In fact, it could apply to RC and UD QPs as well.
>>> In addition, if we implement full connection establishment in the kernel (i.e., the static_rate field is filled in the kernel), we would still like to allow applications to further restrict their rate using a different component mask.
>>> --Liran
>>>
>>
>> I applied this series.  I agree with Liran that the need for something
>> other than an enum to denote rates is needed.  We will have to work out
>> exactly how to present this to user space (keep both, replace
>> static_rate and bumb the verbs API number, other options?)
> 
> I think in the new ioctl interface values like this should be more generic.
> MTU is another example.

Agreed.

> FWIW I agree with Sean that static rate should have been changed to accommodate
> this expanded meaning.

Changing static rate only matters at the user application level.  At the
driver to kernel level, it's easy to say this is driver specific and the
user space driver can communicate with the kernel driver any way it
chooses.  It's a non issue at this level.  It's later that we have to
figure things out, which was the point of my comments and why I took this.

> With this new structure definition which rate is in effect?  Are devices
> supposed to take the min of the 2?

How did you write your kernel driver and how did you write your user
space library?  Because until the rdma-core to application API gets some
modification, your user space driver will only ever get static_rate with
its existing meaning from the user app and what your driver does with it
internally is moot.  What's more, it doesn't matter if mlx5 and hfi1 do
the same thing, only that what they do is consistent with themselves.
It's when we add that user bit that we have to make things right and set
them in concrete.
diff mbox

Patch

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 8368764..3e688b3 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1014,6 +1014,7 @@  static const struct {
 						 IB_QP_QKEY),
 				 [IB_QPT_GSI] = (IB_QP_CUR_STATE		|
 						 IB_QP_QKEY),
+				 [IB_QPT_RAW_PACKET] = IB_QP_RATE_LIMIT,
 			 }
 		}
 	},
@@ -1047,6 +1048,7 @@  static const struct {
 						IB_QP_QKEY),
 				[IB_QPT_GSI] = (IB_QP_CUR_STATE			|
 						IB_QP_QKEY),
+				[IB_QPT_RAW_PACKET] = IB_QP_RATE_LIMIT,
 			}
 		},
 		[IB_QPS_SQD]   = {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ad43a4..a065361 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1102,6 +1102,7 @@  enum ib_qp_attr_mask {
 	IB_QP_RESERVED2			= (1<<22),
 	IB_QP_RESERVED3			= (1<<23),
 	IB_QP_RESERVED4			= (1<<24),
+	IB_QP_RATE_LIMIT		= (1<<25),
 };
 
 enum ib_qp_state {
@@ -1151,6 +1152,7 @@  struct ib_qp_attr {
 	u8			rnr_retry;
 	u8			alt_port_num;
 	u8			alt_timeout;
+	u32			rate_limit;
 };
 
 enum ib_wr_opcode {