diff mbox

[2/2] Ad IB_MTU_1500|9000 enums.

Message ID 1364994796-10642-2-git-send-email-jsquyres@cisco.com (mailing list archive)
State Rejected, archived
Headers show

Commit Message

Jeff Squyres April 3, 2013, 1:13 p.m. UTC
Allow specification of common Ethernet MTUs.

---
 include/rdma/ib_addr.h  | 6 +++++-
 include/rdma/ib_verbs.h | 8 ++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

Comments

Roland Dreier April 3, 2013, 4:52 p.m. UTC | #1
On Wed, Apr 3, 2013 at 6:13 AM, Jeff Squyres <jsquyres@cisco.com> wrote:
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 8a66758..4670f6f 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -174,8 +174,10 @@ enum ib_mtu {
>         IB_MTU_256  = 1,
>         IB_MTU_512  = 2,
>         IB_MTU_1024 = 3,
> -       IB_MTU_2048 = 4,
> -       IB_MTU_4096 = 5
> +       IB_MTU_1500 = 4,
> +       IB_MTU_2048 = 5,
> +       IB_MTU_4096 = 6,
> +       IB_MTU_9000 = 7
>  };

I don't think we can blithely do this... I think the IB enum values
are defined to match the values used in the IB spec (PathRecord etc).

Even if we change it so 1500 and 9000 are outside of the range used by
the IB spec, I don't understand the motivation for this change.  What
does this buy us?  How is iWARP working today without this change?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise April 3, 2013, 5:13 p.m. UTC | #2
On 4/3/2013 11:52 AM, Roland Dreier wrote:
> On Wed, Apr 3, 2013 at 6:13 AM, Jeff Squyres <jsquyres@cisco.com> wrote:
>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index 8a66758..4670f6f 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -174,8 +174,10 @@ enum ib_mtu {
>>          IB_MTU_256  = 1,
>>          IB_MTU_512  = 2,
>>          IB_MTU_1024 = 3,
>> -       IB_MTU_2048 = 4,
>> -       IB_MTU_4096 = 5
>> +       IB_MTU_1500 = 4,
>> +       IB_MTU_2048 = 5,
>> +       IB_MTU_4096 = 6,
>> +       IB_MTU_9000 = 7
>>   };
> I don't think we can blithely do this... I think the IB enum values
> are defined to match the values used in the IB spec (PathRecord etc).
>
> Even if we change it so 1500 and 9000 are outside of the range used by
> the IB spec, I don't understand the motivation for this change.  What
> does this buy us?  How is iWARP working today without this change?
>
>

The IB_MTU stuff really doesn't apply to iwarp devices.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Squyres April 4, 2013, 12:22 p.m. UTC | #3
On Apr 3, 2013, at 12:52 PM, Roland Dreier <roland@purestorage.com> wrote:

> I don't think we can blithely do this... I think the IB enum values
> are defined to match the values used in the IB spec (PathRecord etc).

Gotcha.  I inserted the enums in their proper numerical order to make the range comparisons simpler in ib_addr.h.  But the 1500/9000 values could be tacked at the end of the current values (e.g., 6 and 7, respectively) -- it would just necessitate some different changes in ib_addr.h.

> Even if we change it so 1500 and 9000 are outside of the range used by
> the IB spec, I don't understand the motivation for this change.  What
> does this buy us?  

Our impression was that a userspace application cannot know the max message size it can send across a UD QP without having an accurate MTU enum.  Specifically: the ibv_port_attr.max_msg_size value seems to be a higher-level value.  E.g., on Mellanox devices, .max_msg_size is the max size of RC QP messages.

Is there another way to determine max UD QP message size that we missed?

> How is iWARP working today without this change?

They lie about the actual/underlying MTU.  But they don't have UD QPs, so .max_msg_size is sufficient for their RC QPs.
Hal Rosenstock April 4, 2013, 12:31 p.m. UTC | #4
On 4/4/2013 8:22 AM, Jeff Squyres (jsquyres) wrote:
> On Apr 3, 2013, at 12:52 PM, Roland Dreier <roland@purestorage.com> wrote:
> 
>> I don't think we can blithely do this... I think the IB enum values
>> are defined to match the values used in the IB spec (PathRecord etc).
> 
> Gotcha.  I inserted the enums in their proper numerical order to make the range comparisons simpler in ib_addr.h.  But the 1500/9000 values could be tacked at the end of the current values (e.g., 6 and 7, respectively) -- it would just necessitate some different changes in ib_addr.h.

What happens if in the future the IBTA adds new MTUs and allocates those
currently reserved MTU values ? Wouldn't those values need to be
standardized at the IBTA so that conflict won't occur ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean April 4, 2013, 2:43 p.m. UTC | #5
> What happens if in the future the IBTA adds new MTUs and allocates those
> currently reserved MTU values ? Wouldn't those values need to be
> standardized at the IBTA so that conflict won't occur ?

The IBTA needs to standardize the values as they appear in MADs.  The software values can differ.  They would just need to be mapped.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny April 4, 2013, 5:33 p.m. UTC | #6
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> 
> > What happens if in the future the IBTA adds new MTUs and allocates
> > those currently reserved MTU values ? Wouldn't those values need to be
> > standardized at the IBTA so that conflict won't occur ?
> 
> The IBTA needs to standardize the values as they appear in MADs.  The
> software values can differ.  They would just need to be mapped.

Even with a map I think having IB_MTU_1500 will cause some confusion as this is not an "IB" MTU.  It seems an alternate enum name like RDMA_MTU_1500 is better.

Just a stupid users opinion,
Ira
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean April 4, 2013, 5:43 p.m. UTC | #7
> Even with a map I think having IB_MTU_1500 will cause some confusion as this is
> not an "IB" MTU.  It seems an alternate enum name like RDMA_MTU_1500 is better.

Couldn't these be usable MTU's for RoCE?

In hindsight, the user space API never should have exposed the mtu as an enum...

Since an enum is an int, and we're never going to have anything with an mtu <= 5 bytes, couldn't we just store all new mtu values directly as their byte value?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny April 4, 2013, 5:57 p.m. UTC | #8
> -----Original Message-----
> From: Hefty, Sean
> 
> > Even with a map I think having IB_MTU_1500 will cause some confusion
> > as this is not an "IB" MTU.  It seems an alternate enum name like
> RDMA_MTU_1500 is better.
> 
> Couldn't these be usable MTU's for RoCE?

I guess so, I don't have much experience with RoCE.  If that is the case the RoCE annex might look at reserving these values in the spec?

> 
> In hindsight, the user space API never should have exposed the mtu as an
> enum...
> 
> Since an enum is an int, and we're never going to have anything with an mtu
> <= 5 bytes, couldn't we just store all new mtu values directly as their byte
> value?

That seems like a pretty good idea.

Ira
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Squyres April 8, 2013, 9:56 p.m. UTC | #9
On Apr 4, 2013, at 1:57 PM, "Weiny, Ira" <ira.weiny@intel.com> wrote:

>> In hindsight, the user space API never should have exposed the mtu as an
>> enum...
>> 
>> Since an enum is an int, and we're never going to have anything with an mtu
>> <= 5 bytes, couldn't we just store all new mtu values directly as their byte
>> value?
> 
> That seems like a pretty good idea.


Agreed, but changing to an int would seem to have some fairly serious backwards compatibility issues.

What is the right way to move forward here?

Just to re-state: our issue is that there does not seem to be any other way to get the max UD message size without knowing the actual MTU (are we incorrect about that?).  Hence, using the IB-defined values is not really sufficient.
Hefty, Sean April 8, 2013, 10:16 p.m. UTC | #10
> Agreed, but changing to an int would seem to have some fairly serious backwards
> compatibility issues.

Why can't IB_MTU_1500 = 1500?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny April 9, 2013, 8:10 p.m. UTC | #11
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> Subject: Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
> 
> On Apr 4, 2013, at 1:57 PM, "Weiny, Ira" <ira.weiny@intel.com> wrote:
> 
> >> In hindsight, the user space API never should have exposed the mtu as
> >> an enum...
> >>
> >> Since an enum is an int, and we're never going to have anything with
> >> an mtu <= 5 bytes, couldn't we just store all new mtu values directly
> >> as their byte value?
> >
> > That seems like a pretty good idea.
> 
> 
> Agreed, but changing to an int would seem to have some fairly serious
> backwards compatibility issues.
> 
> What is the right way to move forward here?
> 
> Just to re-state: our issue is that there does not seem to be any other way to
> get the max UD message size without knowing the actual MTU (are we
> incorrect about that?).  Hence, using the IB-defined values is not really
> sufficient.

I guess I am confused.  Is this patch trying to support RoCE or a VNIC?

Ira


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Squyres April 9, 2013, 10:27 p.m. UTC | #12
On Apr 8, 2013, at 6:16 PM, "Hefty, Sean" <sean.hefty@intel.com> wrote:

> Why can't IB_MTU_1500 = 1500?


It certainly could.  Additionally, since Roland was a little concerned about the "IB" prefix (since 1500 and 9000 are not IBTA-sanctioned MTUs), they could have a different prefix -- perhaps RDMA_MTU_1500.  

Although I admit that it would be weird to have an enum that contains values with different prefixes:

enum ib_mtu {
        IB_MTU_256  = 1,
        IB_MTU_512  = 2,
        IB_MTU_1024 = 3,
        IB_MTU_2048 = 4,
        IB_MTU_4096 = 5,
        RDMA_MTU_1500 = 1500,
        RDMA_MTU_9000 =	9000
};
Jeff Squyres April 9, 2013, 10:29 p.m. UTC | #13
On Apr 9, 2013, at 4:10 PM, "Weiny, Ira" <ira.weiny@intel.com> wrote:

>> Just to re-state: our issue is that there does not seem to be any other way to
>> get the max UD message size without knowing the actual MTU (are we
>> incorrect about that?).  Hence, using the IB-defined values is not really
>> sufficient.
> 
> I guess I am confused.  Is this patch trying to support RoCE or a VNIC?


Both, actually.

The RoCE driver lies about its MTU (IIRC, it claims IB_MTU_1024, even if the MTU is actually 1500).  So AFAIK, there's no way to know what the UD max message size is on RoCE, because the max message size attribute on port refers to RC, not UD.
Ira Weiny April 9, 2013, 11:59 p.m. UTC | #14
> -----Original Message-----
> From: Jeff Squyres (jsquyres) [mailto:jsquyres@cisco.com]
> Subject: Re: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
> 
> On Apr 8, 2013, at 6:16 PM, "Hefty, Sean" <sean.hefty@intel.com> wrote:
> 
> > Why can't IB_MTU_1500 = 1500?
> 

Sean,

If the IBTA were to release new MTU enumerations which values would you recommend then?

Ira

> 
> It certainly could.  Additionally, since Roland was a little concerned about the
> "IB" prefix (since 1500 and 9000 are not IBTA-sanctioned MTUs), they could
> have a different prefix -- perhaps RDMA_MTU_1500.
> 
> Although I admit that it would be weird to have an enum that contains values
> with different prefixes:
> 
> enum ib_mtu {
>         IB_MTU_256  = 1,
>         IB_MTU_512  = 2,
>         IB_MTU_1024 = 3,
>         IB_MTU_2048 = 4,
>         IB_MTU_4096 = 5,
>         RDMA_MTU_1500 = 1500,
>         RDMA_MTU_9000 =	9000
> };
> 
> --
> Jeff Squyres
> jsquyres@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean April 10, 2013, 1:30 a.m. UTC | #15
> If the IBTA were to release new MTU enumerations which values would you
> recommend then?

I don't think there's a great solution here.  We're mixing IBTA encoded values with non-IBTA values.  We could reserve the 6-bit encoded values for IB, and use direct values for others (or at least jump beyond the 6-bit range).  Or we can stop matching new IBTA MTU encodings (e.g. IB_MTU_1500 = 6).  Or we go back in time and make mtu an int.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ira Weiny April 10, 2013, 2:44 a.m. UTC | #16
> -----Original Message-----
> From: Hefty, Sean
> Sent: Tuesday, April 09, 2013 6:30 PM
> To: Weiny, Ira; Jeff Squyres (jsquyres)
> Cc: Hal Rosenstock; Roland Dreier; linux-rdma@vger.kernel.org; Upinder
> Malhi (umalhi)
> Subject: RE: [PATCH 2/2] Ad IB_MTU_1500|9000 enums.
> 
> > If the IBTA were to release new MTU enumerations which values would
> > you recommend then?
> 
> I don't think there's a great solution here.  We're mixing IBTA encoded values
> with non-IBTA values.  We could reserve the 6-bit encoded values for IB, and
> use direct values for others (or at least jump beyond the 6-bit range).  Or we
> can stop matching new IBTA MTU encodings (e.g. IB_MTU_1500 = 6).  Or we
> go back in time and make mtu an int.
> 

I thought reserving the 6 bit's for IB and allowing the enum values to match the MTU was a pretty good compromise.  Especially since PathRecord is defined in sa.h which is provided by libibverbs.  That allows for that IB MTU enum to be used there.

OTOH, now that we have moved toward decent defines in the libibumad  library we could define the MTU enum there.  But then we again go down the path of defining things multiple places and confusing the users...  :-(

As an aside I like the use of RDMA_MTU_* for these values.  Again to distinguish them from the IBTA values.  But I know that is poor form.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Squyres April 12, 2013, 6:40 p.m. UTC | #17
On Apr 9, 2013, at 10:44 PM, "Weiny, Ira" <ira.weiny@intel.com> wrote:

> As an aside I like the use of RDMA_MTU_* for these values.  Again to distinguish them from the IBTA values.  But I know that is poor form.


So what's the right way to move forward on this?  Is it this:

enum ib_mtu {
       IB_MTU_256  = 1,
       IB_MTU_512  = 2,
       IB_MTU_1024 = 3,
       IB_MTU_2048 = 4,
       IB_MTU_4096 = 5,
       RDMA_MTU_1500 = 1500,
       RDMA_MTU_9000 =	9000
};
Jeff Squyres April 19, 2013, 3:24 p.m. UTC | #18
On Apr 12, 2013, at 11:40 AM, Jeff Squyres (jsquyres) <jsquyres@cisco.com> wrote:

>> As an aside I like the use of RDMA_MTU_* for these values.  Again to distinguish them from the IBTA values.  But I know that is poor form.
> 
> So what's the right way to move forward on this?  Is it this:
> 
> enum ib_mtu {
>       IB_MTU_256  = 1,
>       IB_MTU_512  = 2,
>       IB_MTU_1024 = 3,
>       IB_MTU_2048 = 4,
>       IB_MTU_4096 = 5,
>       RDMA_MTU_1500 = 1500,
>       RDMA_MTU_9000 =	9000
> };


Bump.
Doug Ledford April 20, 2013, 3:55 a.m. UTC | #19
On 04/19/2013 11:24 AM, Jeff Squyres (jsquyres) wrote:
> On Apr 12, 2013, at 11:40 AM, Jeff Squyres (jsquyres) <jsquyres@cisco.com> wrote:
> 
>>> As an aside I like the use of RDMA_MTU_* for these values.  Again to distinguish them from the IBTA values.  But I know that is poor form.
>>
>> So what's the right way to move forward on this?  Is it this:
>>
>> enum ib_mtu {
>>       IB_MTU_256  = 1,
>>       IB_MTU_512  = 2,
>>       IB_MTU_1024 = 3,
>>       IB_MTU_2048 = 4,
>>       IB_MTU_4096 = 5,
>>       RDMA_MTU_1500 = 1500,
>>       RDMA_MTU_9000 =	9000
>> };
> 
> 
> Bump.
> 

This seems reasonable, but still concerns me a bit.  The original
version was flat out wrong because you can't re-arrange any exposed enum
like this without requiring that all user space apps be recompiled.
This is especially true because ibv_mtu_enum_to_int is an inline, so any
compiled programs won't have been updated by a shared library update,
yet the new enum values can end up showing up, so the apps break when
they start seeing -1 as the return value, or when they interpret 1500 as
2048, etc.

I wonder if the correct way forward isn't to leave these two functions
alone (as inlines you can't change them without a recompile anyway).
However, I would officially change the documentation to specify that
both enum ib_mtu in the queue_pair structs and the result of
ib_mtu_enum_to_int is not exact on non-IB link layers and is deprecated
in favor of a new function: ib_qp_mtu() (or similar) that takes a queue
pair and returns the real mtu for that queue pair based on params and
device the queue pair is on (I could also see it being
ibv_dev_mtu(struct ibv_device *) instead if you want to query the mtu
before you make a queue pair instead).  Then for both IBoE and USNIC
devices, I would just store the actual MTU in the internal ibv device
struct and return that when requested.

This maintains 100% back compatibility with existing apps but allows a
path for people to get better performance/utilization by using the new
function instead of the old one.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean April 20, 2013, 11:29 p.m. UTC | #20
> This seems reasonable, but still concerns me a bit.  The original
> version was flat out wrong because you can't re-arrange any exposed enum
> like this without requiring that all user space apps be recompiled.
> This is especially true because ibv_mtu_enum_to_int is an inline

ib_mtu_enum_to_int() is a kernel function, not user space, so I think we're fine here, unless you're concerned about drivers built out of tree.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford April 22, 2013, 5:30 p.m. UTC | #21
On 04/20/2013 07:29 PM, Hefty, Sean wrote:
>> This seems reasonable, but still concerns me a bit.  The original 
>> version was flat out wrong because you can't re-arrange any exposed
>> enum like this without requiring that all user space apps be
>> recompiled. This is especially true because ibv_mtu_enum_to_int is
>> an inline
> 
> ib_mtu_enum_to_int() is a kernel function, not user space, so I think
> we're fine here, unless you're concerned about drivers built out of
> tree.

Well, although *I* might have to worry about out of kernel drivers, I
wouldn't suggest such for upstream.  However, for some reason I had it
in my mind when I was reading the patch that it was against libibverbs.
 That's what I get for staying up late and reviewing when I'm tired :-/


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Squyres April 22, 2013, 6 p.m. UTC | #22
On Apr 22, 2013, at 1:30 PM, Doug Ledford <dledford@redhat.com> wrote:

> However, for some reason I had it
> in my mind when I was reading the patch that it was against libibverbs.
> That's what I get for staying up late and reviewing when I'm tired :-/


There were other patches against libibverbs that were submitted at the same time.

That being said, I see two obvious ways to go forward, both of which have pros/cons:

1. Extend the enum ib_mtu to include new enum values for 1500 and 9000 -- probably with a different prefix to indicate that they're not IBTA-sanctioned values (note that this will also require corresponding changes in libibverbs, since MTU values get passed up from kernel to userspace).

PRO: fixes the immediate problem
PRO: probably the lowest impact solution; just adding some more enum values
CON: weird naming (IB_ and RDMA_ prefixes in the same ib_mtu enum; probably something similar in userspace)
CON: doesn't do anything to address other MTU values (e.g., what if someone has an MTU of 1498?)

2. Change all instances of ib_mtu/ibv_mtu to an int.  Code such as "switch(mtu) case IBV_MTU_1024: ..." will need to be updated to "switch(mtu) case 1024: ...".

PRO: solves the problem for all MTU values
PRO: eliminates the enum-to-int translation functions
CON: much driver code will need to be updated per above, and also update logic checking for out-of-bounds MTU calues
CON: similarly, userspace apps will need to be updated; it might be worthwhile to bump libibverbs to 2.x, and then intentionally change the MTU field names in ibv_port_attr and ibv_qp_attr so that apps using those fields will fail to compile with libibverbs 2.x (and therefore forcibly realize they need to adapt to the new int MTU values)
Doug Ledford April 22, 2013, 8 p.m. UTC | #23
On 04/22/2013 02:00 PM, Jeff Squyres (jsquyres) wrote:

> 2. Change all instances of ib_mtu/ibv_mtu to an int.  Code such as 
> "switch(mtu) case IBV_MTU_1024: ..." will need to be updated to 
> "switch(mtu) case 1024: ...".
> 
> PRO: solves the problem for all MTU values
> PRO: eliminates the num-to-int translation functions
> CON: much driver code will need to be updated per above, and also
> update logic checking for out-of-bounds MTU calues
> CON: similarly, userspace apps will need to be updated; it might be
> worthwhile to bump libibverbs to 2.x, and then intentionally change
> the MTU field names in ibv_port_attr and ibv_qp_attr so that apps
> using those fields will fail to compile with libibverbs 2.x
> (and therefore forcibly realize they need to adapt to the new
> int MTU values)

I was actually thinking that an ibverbs API version 2.0 might be an
interesting way to go.  The proliferation of non-IB link layers
providing the verbs API make some of the original assumptions of IB link
layer in the original API obsolete.  But, if we were to do that, I'd
take some time to really think the issue over and try to catch all of
the needed updates in one go.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Squyres May 2, 2013, 10:43 a.m. UTC | #24
On Apr 22, 2013, at 4:00 PM, Doug Ledford <dledford@redhat.com> wrote:

>> 2. Change all instances of ib_mtu/ibv_mtu to an int.  Code such as 
>> "switch(mtu) case IBV_MTU_1024: ..." will need to be updated to 
>> "switch(mtu) case 1024: ...".
> 
> I was actually thinking that an ibverbs API version 2.0 might be an
> interesting way to go.  The proliferation of non-IB link layers
> providing the verbs API make some of the original assumptions of IB link
> layer in the original API obsolete.  But, if we were to do that, I'd
> take some time to really think the issue over and try to catch all of
> the needed updates in one go.


In addition to the MTU, another obvious issue is the active_speed attribute on the ibv_port_attr.  On the kernel side, it's an enum (IB_SPEED_SDR through IB_SPEED_EDR), but there's no corresponding enum names in libibverbs.  

It would be good to make this value a non-enum-int, too.
diff mbox

Patch

diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index 9996539..1f6fbbc 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -200,10 +200,14 @@  static inline enum ib_mtu iboe_get_mtu(int mtu)
 	 */
 	mtu = mtu - IB_GRH_BYTES - IB_BTH_BYTES - 28;
 
-	if (mtu >= ib_mtu_enum_to_int(IB_MTU_4096))
+	if (mtu >= ib_mtu_enum_to_int(IB_MTU_9000))
+		return IB_MTU_9000;
+	else if (mtu >= ib_mtu_enum_to_int(IB_MTU_4096))
 		return IB_MTU_4096;
 	else if (mtu >= ib_mtu_enum_to_int(IB_MTU_2048))
 		return IB_MTU_2048;
+	else if (mtu >= ib_mtu_enum_to_int(IB_MTU_1500))
+		return IB_MTU_1500;
 	else if (mtu >= ib_mtu_enum_to_int(IB_MTU_1024))
 		return IB_MTU_1024;
 	else if (mtu >= ib_mtu_enum_to_int(IB_MTU_512))
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8a66758..4670f6f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -174,8 +174,10 @@  enum ib_mtu {
 	IB_MTU_256  = 1,
 	IB_MTU_512  = 2,
 	IB_MTU_1024 = 3,
-	IB_MTU_2048 = 4,
-	IB_MTU_4096 = 5
+	IB_MTU_1500 = 4,
+	IB_MTU_2048 = 5,
+	IB_MTU_4096 = 6,
+	IB_MTU_9000 = 7
 };
 
 static inline int ib_mtu_enum_to_int(enum ib_mtu mtu)
@@ -184,8 +186,10 @@  static inline int ib_mtu_enum_to_int(enum ib_mtu mtu)
 	case IB_MTU_256:  return  256;
 	case IB_MTU_512:  return  512;
 	case IB_MTU_1024: return 1024;
+	case IB_MTU_1500: return 1500;
 	case IB_MTU_2048: return 2048;
 	case IB_MTU_4096: return 4096;
+	case IB_MTU_9000: return 9000;
 	default: 	  return -1;
 	}
 }