diff mbox

[RFC] New verbs API for flow action: ENCAP

Message ID CAFgAxU81GmtWkHpeF4CTPq2KDezVRmc7kkrPdAS5H8Higxg=jQ@mail.gmail.com (mailing list archive)
State RFC
Headers show

Commit Message

Alex Rosenbaum April 4, 2018, 9:15 a.m. UTC
This new verbs flow action allows the user to create an encapsulation
operation to be done by the NIC and attach it to a steering entry as an
action.

ibv_create_flow_action_encap() is introduced which accepts the
encapsulation protocol and a buffer which contains the encapsulating
headers.

We expose one variant at this time:

IBV_FLOW_ACTION_ENCAP_RAW: Allows user to provide a custom/unspecified
encapsulation protocol buffer.

See example code which demo's usage of of IBV_FLOW_ACTION_ENCAP_RAW, to define
an IPv4 VXLAN encapsulation header, in which case the user provides the
following headers in the buffer: Ethernet, IPv4, UDP and VXLAN.

Setting up an ENCAP flow action for an egress flow steering rule can fully
offload software based route lookup of an overlay network with the addition of
mapping an additional sge's when sending traffic on a RAW QP.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>
---
 libibverbs/examples/encap_flow.c                 | 152 +++++++++++++++++++++++
 libibverbs/man/ibv_create_flow_action_encap.3.md |  82 ++++++++++++
 libibverbs/verbs.h                               |  24 ++++
 3 files changed, 258 insertions(+)
 create mode 100644 libibverbs/examples/encap_flow.c
 create mode 100644 libibverbs/man/ibv_create_flow_action_encap.3.md

                                    uint64_t dm_offset, size_t length,
                                    unsigned int access);
@@ -1963,6 +1972,21 @@ ibv_modify_flow_action_esp(struct
ibv_flow_action *action,
  return vctx->modify_flow_action_esp(action, esp);
 }

+static inline struct ibv_flow_action *
+ibv_create_flow_action_encap(struct ibv_context *ctx,
+      struct ibv_flow_action_encap_attr *encap)
+{
+ struct verbs_context *vctx = verbs_get_ctx_op(ctx,
+       create_flow_action_encap);
+
+ if (!vctx) {
+ errno = ENOSYS;
+ return NULL;
+ }
+
+ return vctx->create_flow_action_encap(ctx, encap);
+}
+
 static inline int ibv_destroy_flow_action(struct ibv_flow_action *action)
 {
  struct verbs_context *vctx = verbs_get_ctx_op(action->context,

Comments

Jason Gunthorpe April 6, 2018, 5:21 p.m. UTC | #1
On Wed, Apr 04, 2018 at 12:15:53PM +0300, Alex Rosenbaum wrote:

Your diff seems to have been whitespace mangled..

> +# SYNOPSIS
> +
> +```c
> +#include <infiniband/verbs.h>
> +
> +struct ibv_flow_action *
> +ibv_create_flow_action_encap(struct ibv_context *ctx,
> +      struct ibv_flow_action_encap_attr *attr);
> +
> +int ibv_destroy_flow_action(struct ibv_flow_action *action);
> +```

> +# DESCRIPTION
> +
> +An ENCAP flow steering action allows a flow steering rule to encapsulate
> +a packet after matching.
> +
> +After the action is created, it should be associated with a *struct
> +ibv_flow_attr* using *struct ibv_flow_spec_action_handle* flow specification.
> +An action can be associated with multiple flows.
> +
> +# ARGUMENTS
> +
> +*ctx*
> +: RDMA device context to create the action on.
> +
> +*attr*
> +: ENCAP type and header to be used.
> +
> +*action*
> +: Existing action to be destroyed.
> +
> +## *action* Argument
> +
> +```c
> +struct ibv_flow_action_encap_attr {
> + enum ibv_flow_action_encap hdr_proto;
> + uint16_t                hdr_len;
> + void                    *hdr_ptr;
> +};
> +```

We don't really need this struct, we really have struct-itis in verbs,
just put these three things as function arguments.

hdr_len should be size_t and hdr_ptr is const

> +*hdr_proto*
> +: Encapsulation protocol to be used.
> +
> +*hdr_len*
> +: Length in bytes of the provided encap headers.
> +
> +*hdr_ptr*
> +: Buffing containing the encap headers.

Describe the lifetime, I assume this function copies the header
internally so hdr_ptr can be deleted by the caller immediately?

> +## Encapsulate protocols (*ibv_flow_action_encap*)
> +
> +A user can create flow actions that implement different encapsulation
> +protocols.
> +
> +*IBV_FLOW_ACTION_ENCAP_RAW* provides the ability to provide unspecified
> +encapsulation protocol, this can be expressed by passing it in *hdr_proto*.

This man page doesn't actually describe what is supposed to happen.

I guess it is something like

Prepend the given hdr_len bytes of hdr_ptr in front of the packet
before sending it

 ?

Given that is RAW the right proto mode? Maybe just 'PREPEND' ?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Bloch April 8, 2018, 8:21 a.m. UTC | #2
On 06/04/2018 20:21, Jason Gunthorpe wrote:
> On Wed, Apr 04, 2018 at 12:15:53PM +0300, Alex Rosenbaum wrote:
> 
> Your diff seems to have been whitespace mangled..
> 
>> +# SYNOPSIS
>> +
>> +```c
>> +#include <infiniband/verbs.h>
>> +
>> +struct ibv_flow_action *
>> +ibv_create_flow_action_encap(struct ibv_context *ctx,
>> +      struct ibv_flow_action_encap_attr *attr);
>> +
>> +int ibv_destroy_flow_action(struct ibv_flow_action *action);
>> +```
> 
>> +# DESCRIPTION
>> +
>> +An ENCAP flow steering action allows a flow steering rule to encapsulate
>> +a packet after matching.
>> +
>> +After the action is created, it should be associated with a *struct
>> +ibv_flow_attr* using *struct ibv_flow_spec_action_handle* flow specification.
>> +An action can be associated with multiple flows.
>> +
>> +# ARGUMENTS
>> +
>> +*ctx*
>> +: RDMA device context to create the action on.
>> +
>> +*attr*
>> +: ENCAP type and header to be used.
>> +
>> +*action*
>> +: Existing action to be destroyed.
>> +
>> +## *action* Argument
>> +
>> +```c
>> +struct ibv_flow_action_encap_attr {
>> + enum ibv_flow_action_encap hdr_proto;
>> + uint16_t                hdr_len;
>> + void                    *hdr_ptr;
>> +};
>> +```
> 
> We don't really need this struct, we really have struct-itis in verbs,
> just put these three things as function arguments.
> 
> hdr_len should be size_t and hdr_ptr is const
> 

Will fix.

>> +*hdr_proto*
>> +: Encapsulation protocol to be used.
>> +
>> +*hdr_len*
>> +: Length in bytes of the provided encap headers.
>> +
>> +*hdr_ptr*
>> +: Buffing containing the encap headers.
> 
> Describe the lifetime, I assume this function copies the header
> internally so hdr_ptr can be deleted by the caller immediately?

Yep, actually the kernel driver only gets an ID/INDEX to be used.
Will update.

> 
>> +## Encapsulate protocols (*ibv_flow_action_encap*)
>> +
>> +A user can create flow actions that implement different encapsulation
>> +protocols.
>> +
>> +*IBV_FLOW_ACTION_ENCAP_RAW* provides the ability to provide unspecified
>> +encapsulation protocol, this can be expressed by passing it in *hdr_proto*.
> 
> This man page doesn't actually describe what is supposed to happen.
> 
> I guess it is something like
> 
> Prepend the given hdr_len bytes of hdr_ptr in front of the packet
> before sending it
> 
>  ?

Well, this is mostly true, the HW might be doing some modification to the
headers, like writing the correct checksum value/proper length and stuff
like that, but it's all pretty standard stuff when it comes to encapsulation.

One thing I didn't mention, is that there might be some MTU math that needs to
be done by the user, for example if the MTU is 1500, and you know you are going
to do vxlan encap (without vlan) in HW, you probably don't want to send packets bigger than
1450 as the encap operation will add 50 bytes to the packet.

> 
> Given that is RAW the right proto mode? Maybe just 'PREPEND' ?
I think RAW is better, as there might be some other vendors that will need to provide
a type, like MPLS/GRE/VXLAN, and then the PREPEND will look out of place as they
all do pre append.

> 
> Jason
> 

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe April 9, 2018, 5:15 p.m. UTC | #3
On Sun, Apr 08, 2018 at 11:21:59AM +0300, Mark Bloch wrote:

> >> +## Encapsulate protocols (*ibv_flow_action_encap*)
> >> +
> >> +A user can create flow actions that implement different encapsulation
> >> +protocols.
> >> +
> >> +*IBV_FLOW_ACTION_ENCAP_RAW* provides the ability to provide unspecified
> >> +encapsulation protocol, this can be expressed by passing it in *hdr_proto*.
> > 
> > This man page doesn't actually describe what is supposed to happen.
> > 
> > I guess it is something like
> > 
> > Prepend the given hdr_len bytes of hdr_ptr in front of the packet
> > before sending it
> > 
> >  ?
> 
> Well, this is mostly true, the HW might be doing some modification to the
> headers, like writing the correct checksum value/proper length and stuff
> like that, but it's all pretty standard stuff when it comes to encapsulation.

Well.. that is a big issue and needs careful explanation.

How is the user space to know if the driver supports the 'special'
transformations ?

Seems like a pretty big gap.

> > Given that is RAW the right proto mode? Maybe just 'PREPEND' ?
> I think RAW is better, as there might be some other vendors that will need to provide
> a type, like MPLS/GRE/VXLAN, and then the PREPEND will look out of place as they
> all do pre append.

If the HW is changing it then RAW is not a good name.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Rosenbaum April 10, 2018, 2:31 p.m. UTC | #4
On Mon, Apr 9, 2018 at 8:15 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Sun, Apr 08, 2018 at 11:21:59AM +0300, Mark Bloch wrote:
>
>> >> +## Encapsulate protocols (*ibv_flow_action_encap*)
>> >> +
>> >> +A user can create flow actions that implement different encapsulation
>> >> +protocols.
>> >> +
>> >> +*IBV_FLOW_ACTION_ENCAP_RAW* provides the ability to provide unspecified
>> >> +encapsulation protocol, this can be expressed by passing it in *hdr_proto*.
>> >
>> > This man page doesn't actually describe what is supposed to happen.
>> >
>> > I guess it is something like
>> >
>> > Prepend the given hdr_len bytes of hdr_ptr in front of the packet
>> > before sending it
>> >
>> >  ?
>>
>> Well, this is mostly true, the HW might be doing some modification to the
>> headers, like writing the correct checksum value/proper length and stuff
>> like that, but it's all pretty standard stuff when it comes to encapsulation.
>
> Well.. that is a big issue and needs careful explanation.
>
> How is the user space to know if the driver supports the 'special'
> transformations ?
>
> Seems like a pretty big gap.
>

yes, HW capabilities is missing.
I suggest we replace the 'enum type' with 'int flags'.
enum ibv_flow_action_encap_flags {
    IBV_ENCAP_FLAGS_CSUM = 1 << 0,  /* HW offload to calc outer and
inner CSUM for IP and TCP headers */
    IBV_ENCAP_FLAGS_VXLAN_SPORT_ENTROPY = 1 << 1,  /* HW offload calc
of UDP src port entropy value based on inner packet headers */
};

So if user tries to create encap action with CSUM offload and HW does
not support, action creation will fail.

The providers might want to walk the users header to check what user
is trying to do and match these with the flags provided.

>> > Given that is RAW the right proto mode? Maybe just 'PREPEND' ?
>> I think RAW is better, as there might be some other vendors that will need to provide
>> a type, like MPLS/GRE/VXLAN, and then the PREPEND will look out of place as they
>> all do pre append.
>
> If the HW is changing it then RAW is not a good name.

True, this functionality will prepending user bytes to the original
packet, but the common kernel network term is to 'encapsulate' the
original packet with new headers.
Still this needs to be better explained in the V2 man page.

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe April 10, 2018, 2:44 p.m. UTC | #5
On Tue, Apr 10, 2018 at 05:31:06PM +0300, Alex Rosenbaum wrote:
> > Well.. that is a big issue and needs careful explanation.
> >
> > How is the user space to know if the driver supports the 'special'
> > transformations ?
> >
> > Seems like a pretty big gap.
> >
> 
> yes, HW capabilities is missing.
> I suggest we replace the 'enum type' with 'int flags'.
> enum ibv_flow_action_encap_flags {
>     IBV_ENCAP_FLAGS_CSUM = 1 << 0,  /* HW offload to calc outer and
> inner CSUM for IP and TCP headers */
>     IBV_ENCAP_FLAGS_VXLAN_SPORT_ENTROPY = 1 << 1,  /* HW offload calc
> of UDP src port entropy value based on inner packet headers */
> };

I wonder if this is general enough

I would suggest something more like a header set and then a mask, like
we use for the flow matching. Except instead of matching the mask
indicates what bytes the HW is expected to fill in.

The driver scans the header and mask and figures out if it knows how
to fill in the mask'd values, or it fails to create it.

Not sure if a mask or an offset table & type is a better choice though..

> The providers might want to walk the users header to check what user
> is trying to do and match these with the flags provided.

I think this is essential, but the trick is that some times you may
want the HW to fill and sometimes not. Entropy is a good example of
this. Maybe an app wants fixed entropy for an entire flow.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Bloch April 10, 2018, 6:06 p.m. UTC | #6
On 10/04/2018 17:44, Jason Gunthorpe wrote:
> On Tue, Apr 10, 2018 at 05:31:06PM +0300, Alex Rosenbaum wrote:
>>> Well.. that is a big issue and needs careful explanation.
>>>
>>> How is the user space to know if the driver supports the 'special'
>>> transformations ?
>>>
>>> Seems like a pretty big gap.
>>>
>>
>> yes, HW capabilities is missing.
>> I suggest we replace the 'enum type' with 'int flags'.
>> enum ibv_flow_action_encap_flags {
>>     IBV_ENCAP_FLAGS_CSUM = 1 << 0,  /* HW offload to calc outer and
>> inner CSUM for IP and TCP headers */
>>     IBV_ENCAP_FLAGS_VXLAN_SPORT_ENTROPY = 1 << 1,  /* HW offload calc
>> of UDP src port entropy value based on inner packet headers */
>> };
> 
> I wonder if this is general enough

I think once we go down this road, we will end up with more flags than
there are stars in the sky, and the ability to mix and match flags
will be a nightmare to understand/handle.

> 
> I would suggest something more like a header set and then a mask, like
> we use for the flow matching. Except instead of matching the mask
> indicates what bytes the HW is expected to fill in.
> 
> The driver scans the header and mask and figures out if it knows how
> to fill in the mask'd values, or it fails to create it.
> 
> Not sure if a mask or an offset table & type is a better choice though..
> 

I think it makes the implementation/user experience kind of ugly, for example
when talking about vxlan encap in the kernel, it's pretty clear what
should happen (checksum, length adjustment etc), and using udp sport as a way to
provide entropy is standard stuff there. The common use case is well defined
if we assume it's derived from how the networking stack does things and the API
should be simple enough to be used.

Now if a vendor wants to provide extra customization, that's what DV is for.

>> The providers might want to walk the users header to check what user
>> is trying to do and match these with the flags provided.
> 
> I think this is essential, but the trick is that some times you may
> want the HW to fill and sometimes not. Entropy is a good example of
> this. Maybe an app wants fixed entropy for an entire flow
Not flow, but encap ID, as it can be attached to multiple flows, and
if a provider support this stuff, it should expose a DV for that.

I agree we might want to have specific types that are well defined
which have the same outcome across vendors, but that should be defined
on per type basis and not pilling flags one on top of each other.

> 
> Jason
> 

Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe April 10, 2018, 10:28 p.m. UTC | #7
On Tue, Apr 10, 2018 at 09:06:03PM +0300, Mark Bloch wrote:

> I think once we go down this road, we will end up with more flags than
> there are stars in the sky, and the ability to mix and match flags
> will be a nightmare to understand/handle.

I suspect that also.

> > I would suggest something more like a header set and then a mask, like
> > we use for the flow matching. Except instead of matching the mask
> > indicates what bytes the HW is expected to fill in.
> > 
> > The driver scans the header and mask and figures out if it knows how
> > to fill in the mask'd values, or it fails to create it.
> > 
> > Not sure if a mask or an offset table & type is a better choice though..
> 
> I think it makes the implementation/user experience kind of ugly, for example
> when talking about vxlan encap in the kernel, it's pretty clear what
> should happen (checksum, length adjustment etc), and using udp sport as a way to
> provide entropy is standard stuff there. The common use case is well defined
> if we assume it's derived from how the networking stack does things and the API
> should be simple enough to be used.

Yes, but the problem is to define a way to ask if the driver supports
this 'common approach' or not.

So, we need something. What is the least ugly solution? Flags,
something like mask, or something else?

Can't just ignore it.

> > I think this is essential, but the trick is that some times you may
> > want the HW to fill and sometimes not. Entropy is a good example of
> > this. Maybe an app wants fixed entropy for an entire flow
>
> Not flow, but encap ID, as it can be attached to multiple flows, and
> if a provider support this stuff, it should expose a DV for that.

We should not be so quick to reach for DV - in a case like this it
seems to signal that the proposed API is not general enough.

I like the mask idea, it is pretty simple and I think it would
actually be easy enough for the user....

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Bloch April 11, 2018, 7:14 a.m. UTC | #8
On 11/04/2018 01:28, Jason Gunthorpe wrote:
> On Tue, Apr 10, 2018 at 09:06:03PM +0300, Mark Bloch wrote:
> 
>> I think once we go down this road, we will end up with more flags than
>> there are stars in the sky, and the ability to mix and match flags
>> will be a nightmare to understand/handle.
> 
> I suspect that also.
> 
>>> I would suggest something more like a header set and then a mask, like
>>> we use for the flow matching. Except instead of matching the mask
>>> indicates what bytes the HW is expected to fill in.
>>>
>>> The driver scans the header and mask and figures out if it knows how
>>> to fill in the mask'd values, or it fails to create it.
>>>
>>> Not sure if a mask or an offset table & type is a better choice though..
>>
>> I think it makes the implementation/user experience kind of ugly, for example
>> when talking about vxlan encap in the kernel, it's pretty clear what
>> should happen (checksum, length adjustment etc), and using udp sport as a way to
>> provide entropy is standard stuff there. The common use case is well defined
>> if we assume it's derived from how the networking stack does things and the API
>> should be simple enough to be used.
> 
> Yes, but the problem is to define a way to ask if the driver supports
> this 'common approach' or not.
> 
> So, we need something. What is the least ugly solution? Flags,
> something like mask, or something else?
> 
> Can't just ignore it.

Well, seeing as parsing a buffer into network headers doesn't seems that fun,
maybe:

struct hdr {
	enum header_type type;
	size_t len;
	struct hdr *next_hdr;
	u8 *hdr_buf;
	u8 *mask_buf;
}

so at least we have a hint what we should be parsing?

> 
>>> I think this is essential, but the trick is that some times you may
>>> want the HW to fill and sometimes not. Entropy is a good example of
>>> this. Maybe an app wants fixed entropy for an entire flow
>>
>> Not flow, but encap ID, as it can be attached to multiple flows, and
>> if a provider support this stuff, it should expose a DV for that.
> 
> We should not be so quick to reach for DV - in a case like this it
> seems to signal that the proposed API is not general enough.
> 
> I like the mask idea, it is pretty simple and I think it would
> actually be easy enough for the user....

What about the inner headers? for example, we have two option:
1) a user post a packet and tells the driver/hw do: "checksum offload"
2) A user post a packet without checksum offload

The HW needs to do encap on both those packets, does it support checksum offload
on inner headers? when should this feature/cap enforced?

Thinking about it more, the user sending the packets might not be the user that created
the steering rule that does the encap.

I'll take it internally, see if we can solve this in a sane way.

but my 0.02$ is that covering all the options will lead to some ugly code/API.

> 
> Jason
> 

Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Rosenbaum April 11, 2018, 11 a.m. UTC | #9
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@ziepe.ca]
> Sent: Wednesday, April 11, 2018 1:29 AM
> To: Mark Bloch <markb@mellanox.com>
> Cc: Alex Rosenbaum <rosenbaumalex@gmail.com>; linux-
> rdma@vger.kernel.org; Yishai Hadas <yishaih@mellanox.com>; Leon
> Romanovsky <leonro@mellanox.com>; Alex Rosenbaum
> <alexr@mellanox.com>
> Subject: Re: [PATCH RFC] New verbs API for flow action: ENCAP
> 
> On Tue, Apr 10, 2018 at 09:06:03PM +0300, Mark Bloch wrote:
> 
> > I think once we go down this road, we will end up with more flags than
> > there are stars in the sky, and the ability to mix and match flags
> > will be a nightmare to understand/handle.
> 
> I suspect that also.

I can't think of more to add besides the: CSUM, UDP_SPORT_ENTORPY
At this rate I don't think we'll ever reach 32 flags.


> > > I would suggest something more like a header set and then a mask,
> > > like we use for the flow matching. Except instead of matching the
> > > mask indicates what bytes the HW is expected to fill in.
> > >
> > > The driver scans the header and mask and figures out if it knows how
> > > to fill in the mask'd values, or it fails to create it.
> > >
> > > Not sure if a mask or an offset table & type is a better choice though..
> >
> > I think it makes the implementation/user experience kind of ugly, for
> > example when talking about vxlan encap in the kernel, it's pretty
> > clear what should happen (checksum, length adjustment etc), and using
> > udp sport as a way to provide entropy is standard stuff there. The
> > common use case is well defined if we assume it's derived from how the
> > networking stack does things and the API should be simple enough to be
> used.
> 
> Yes, but the problem is to define a way to ask if the driver supports this
> 'common approach' or not.
> 
> So, we need something. What is the least ugly solution? Flags, something like
> mask, or something else?
> 
> Can't just ignore it.
> 
> > > I think this is essential, but the trick is that some times you may
> > > want the HW to fill and sometimes not. Entropy is a good example of
> > > this. Maybe an app wants fixed entropy for an entire flow
> >
> > Not flow, but encap ID, as it can be attached to multiple flows, and
> > if a provider support this stuff, it should expose a DV for that.
> 
> We should not be so quick to reach for DV - in a case like this it seems to
> signal that the proposed API is not general enough.
> 
> I like the mask idea, it is pretty simple and I think it would actually be easy
> enough for the user....

few concerns regarding the buffer mask idea:
1. HW might have more than one way to modify some fields, bit mask blocks this approach. Like UDP entropy based different tuple parts L3 vs L3+L4
2. HW might allow to modify fields in the 'inner' packet. the mask covers only the ENCAL header part [I'm not sure if this is a real use-case]
3. the mask forces the driver implementation to scan the ENCAP header and the matching mask buffer, compared to only checking for support flags for the common case
4. seems harder to explain, compared to flags

On the pro side: this will not require future enum flags updates, just application to set more bits

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe April 13, 2018, 4:09 a.m. UTC | #10
On Wed, Apr 11, 2018 at 11:00:14AM +0000, Alex Rosenbaum wrote:
 
> On the pro side: this will not require future enum flags updates,
> just application to set more bits

I think the pro is more that the application intent is super clear.

The problem with flags is the huge variety of headers and their
ordering and nesting. Flow steering is a good example of how things
are very confusing with flag combinations trying to specify which
place in the stack they are refering to.

If the app is already providing the exact bytes it expects then it is
very reasonable and natural for the application to specify directly
the bytes it expects the HW to fill in.

Totally clear and unambiguous.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/libibverbs/examples/encap_flow.c b/libibverbs/examples/encap_flow.c
new file mode 100644
index 0000000..1e8e547
--- /dev/null
+++ b/libibverbs/examples/encap_flow.c
@@ -0,0 +1,152 @@ 
+/*
+ * Copyright (c) 2018 Mellanox Technologies, Ltd.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <netinet/if_ether.h> //Provides declarations for ethernet header
+#include <netinet/udp.h> //Provides declarations for udp header
+#include <netinet/ip.h> //Provides declarations for ip header
+
+#include <infiniband/verbs.h>
+
+struct vxlanhdr {
+ __be32 flags;
+ __be32 vni;
+};
+
+static uint8_t buf[sizeof(struct ethhdr) +
+    sizeof(struct iphdr) +
+    sizeof(struct udphdr) +
+    sizeof(struct vxlanhdr)] = {};
+
+struct ethhdr eth = {.h_source = {0x52, 0x54, 0x00, 0xf7, 0x56, 0xe2},
+      .h_dest = {0x52, 0x54, 0x00, 0xf7, 0x56, 0xe2},
+      .h_proto = 0x0800};
+
+struct iphdr ip = {.version = 4,
+    .ihl = 5,
+    .tos = 0,
+    .tot_len = 0,
+    .id = 0x73a7,
+    .frag_off = 0x02,
+    .ttl = 64,
+    .protocol = 17,
+    .check = 0,
+    .saddr = (0x1d << 24) | (0x00 << 16) | (0x0a << 8) | 0x64,
+    .daddr = (0x1a << 24) | (0x00 << 16) | (0x0a << 8) | 0x78};
+
+struct udphdr udp = {.source = 0,
+                     .dest = 4789,
+      .len = 0,
+      .check = 0};
+
+struct vxlanhdr vxlan = {.flags = 0x0800,
+ .vni = 0x0028};
+
+static int create_encap_hdr(struct ibv_device *ib_dev)
+{
+ struct ibv_flow_action_encap_attr attr;
+ struct ibv_flow_action *action;
+ struct ethhdr *tmp_eth = (struct ethhdr *)buf;
+ struct iphdr  *tmp_ip = (struct iphdr *)(tmp_eth + 1);
+ struct udphdr *tmp_udp = (struct udphdr *)(tmp_ip + 1);
+ struct vxlanhdr *tmp_vxlan = (struct vxlanhdr *)(tmp_udp + 1);
+ struct ibv_context *ctx;
+ int ret = 0;
+
+ ctx = ibv_open_device(ib_dev);
+ if (!ctx) {
+ fprintf(stderr, "Failed to open device\n");
+ return ret;
+ }
+
+ memcpy(tmp_eth, &eth, sizeof(*tmp_eth));
+ memcpy(tmp_ip, &ip, sizeof(*tmp_ip));
+ memcpy(tmp_udp, &udp, sizeof(*tmp_udp));
+ memcpy(tmp_vxlan, &vxlan, sizeof(*tmp_vxlan));
+
+ attr.hdr_ptr = buf;
+ attr.hdr_len = sizeof(buf);
+ attr.hdr_proto = IBV_FLOW_ACTION_ENCAP_RAW;
+
+ action = ibv_create_flow_action_encap(ctx, &attr);
+
+ printf("%s- got action %p\n", __func__, action);
+ if (action)
+ ibv_destroy_flow_action(action);
+
+ ibv_close_device(ctx);
+
+ return ret;
+}
+
+int main(int argc, char *argv[])
+{
+ struct ibv_device      **dev_list;
+ struct ibv_device *ib_dev;
+ char *ib_devname = NULL;
+ int ret = 1;
+ int i;
+
+ dev_list = ibv_get_device_list(NULL);
+
+ if (!dev_list) {
+ perror("Failed to get IB devices list");
+ return 1;
+ }
+
+ if (argc != 2) {
+ perror("Didn't provide IB device name");
+ goto free_dev_list;
+ }
+
+ ib_devname = argv[1];
+ for (i = 0; dev_list[i]; ++i)
+ if (!strcmp(ibv_get_device_name(dev_list[i]), ib_devname))
+ break;
+
+ ib_dev = dev_list[i];
+ if (!ib_dev) {
+ fprintf(stderr, "IB device %s not found\n", ib_devname);
+ goto free_dev_list;
+ }
+
+ ret = create_encap_hdr(ib_dev);
+
+free_dev_list:
+ ibv_free_device_list(dev_list);
+
+ return ret;
+}
diff --git a/libibverbs/man/ibv_create_flow_action_encap.3.md
b/libibverbs/man/ibv_create_flow_action_encap.3.md
new file mode 100644
index 0000000..5ff18d9
--- /dev/null
+++ b/libibverbs/man/ibv_create_flow_action_encap.3.md
@@ -0,0 +1,82 @@ 
+---
+date: 2018-03-27
+footer: libibverbs
+header: "Libibverbs Programmer's Manual"
+layout: page
+license: 'Licensed under the OpenIB.org BSD license (FreeBSD Variant)
- See COPYING.md'
+section: 3
+title: ibv_flow_action_encap(3)
+tagline: Verbs
+---
+
+# NAME
+
+ibv_flow_action_encap(3) -- Flow action encap for verbs(3)
+
+# SYNOPSIS
+
+```c
+#include <infiniband/verbs.h>
+
+struct ibv_flow_action *
+ibv_create_flow_action_encap(struct ibv_context *ctx,
+      struct ibv_flow_action_encap_attr *attr);
+
+int ibv_destroy_flow_action(struct ibv_flow_action *action);
+```
+
+# DESCRIPTION
+
+An ENCAP flow steering action allows a flow steering rule to encapsulate
+a packet after matching.
+
+After the action is created, it should be associated with a *struct
+ibv_flow_attr* using *struct ibv_flow_spec_action_handle* flow specification.
+An action can be associated with multiple flows.
+
+# ARGUMENTS
+
+*ctx*
+: RDMA device context to create the action on.
+
+*attr*
+: ENCAP type and header to be used.
+
+*action*
+: Existing action to be destroyed.
+
+## *action* Argument
+
+```c
+struct ibv_flow_action_encap_attr {
+ enum ibv_flow_action_encap hdr_proto;
+ uint16_t                hdr_len;
+ void                    *hdr_ptr;
+};
+```
+
+*hdr_proto*
+: Encapsulation protocol to be used.
+
+*hdr_len*
+: Length in bytes of the provided encap headers.
+
+*hdr_ptr*
+: Buffing containing the encap headers.
+
+## Encapsulate protocols (*ibv_flow_action_encap*)
+
+A user can create flow actions that implement different encapsulation
+protocols.
+
+*IBV_FLOW_ACTION_ENCAP_RAW* provides the ability to provide unspecified
+encapsulation protocol, this can be expressed by passing it in *hdr_proto*.
+
+# RETURN VALUE
+
+Upon success *ibv_create_flow_action_encap* will return a new *struct
+ibv_flow_action* object, on error NULL will be returned and errno will be set.
+
+# SEE ALSO
+
+*ibv_create_flow(3)*,  *ibv_destroy_flow_action(3)*
diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h
index eb57824..2f5fc8f 100644
--- a/libibverbs/verbs.h
+++ b/libibverbs/verbs.h
@@ -44,6 +44,7 @@ 
 #include <linux/types.h>
 #include <stdint.h>
 #include <infiniband/verbs_api.h>
+#include <stdio.h>

 #ifdef __cplusplus
 #include <limits>
@@ -1583,6 +1584,12 @@  struct ibv_flow_action_esp_attr {
  uint32_t esn;
 };

+struct ibv_flow_action_encap_attr {
+ enum ibv_flow_action_encap hdr_proto;
+ uint16_t                hdr_len;
+ void                    *hdr_ptr;
+};
+
 struct ibv_device;
 struct ibv_context;

@@ -1737,6 +1737,8 @@  struct ibv_values_ex {

 struct verbs_context {
        /*  "grows up" - new fields go here */
+       struct ibv_flow_action *(*create_flow_action_encap)(struct
ibv_context *context,
+                                                           struct
ibv_flow_action_encap_attr *attr);
        struct ibv_mr *(*reg_dm_mr)(struct ibv_pd *pd, struct ibv_dm *dm,