diff mbox series

[bpf-next,v4,1/3] bpf: support input xdp_md context in BPF_PROG_TEST_RUN

Message ID 20210604220235.6758-2-zeffron@riotgames.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: support input xdp_md context in BPF_PROG_TEST_RUN | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 7 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kpsingh@kernel.org andrii@kernel.org john.fastabend@gmail.com songliubraving@fb.com kuba@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 10043 this patch: 10044
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 83 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns
netdev/build_allmodconfig_warn fail Errors and warnings before: 10457 this patch: 10458
netdev/header_inline success Link

Commit Message

Zvi Effron June 4, 2021, 10:02 p.m. UTC
Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for
BPF_PROG_TEST_RUN.

The intended use case is to pass some XDP meta data to the test runs of
XDP programs that are used as tail calls.

For programs that use bpf_prog_test_run_xdp, support xdp_md input and
output. Unlike with an actual xdp_md during a non-test run, data_meta must
be 0 because it must point to the start of the provided user data. From
the initial xdp_md, use data and data_end to adjust the pointers in the
generated xdp_buff. All other non-zero fields are prohibited (with
EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially
different) xdp_md back to the userspace.

We require all fields of input xdp_md except the ones we explicitly
support to be set to zero. The expectation is that in the future we might
add support for more fields and we want to fail explicitly if the user
runs the program on the kernel where we don't yet support them.

Co-developed-by: Cody Haas <chaas@riotgames.com>
Signed-off-by: Cody Haas <chaas@riotgames.com>
Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com>
Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com>
Signed-off-by: Zvi Effron <zeffron@riotgames.com>
---
 include/uapi/linux/bpf.h |  3 --
 net/bpf/test_run.c       | 77 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 70 insertions(+), 10 deletions(-)

Comments

Yonghong Song June 6, 2021, 3:17 a.m. UTC | #1
On 6/4/21 3:02 PM, Zvi Effron wrote:
> Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for
> BPF_PROG_TEST_RUN.
> 
> The intended use case is to pass some XDP meta data to the test runs of
> XDP programs that are used as tail calls.
> 
> For programs that use bpf_prog_test_run_xdp, support xdp_md input and
> output. Unlike with an actual xdp_md during a non-test run, data_meta must
> be 0 because it must point to the start of the provided user data. From
> the initial xdp_md, use data and data_end to adjust the pointers in the
> generated xdp_buff. All other non-zero fields are prohibited (with
> EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially
> different) xdp_md back to the userspace.
> 
> We require all fields of input xdp_md except the ones we explicitly
> support to be set to zero. The expectation is that in the future we might
> add support for more fields and we want to fail explicitly if the user
> runs the program on the kernel where we don't yet support them.
> 
> Co-developed-by: Cody Haas <chaas@riotgames.com>
> Signed-off-by: Cody Haas <chaas@riotgames.com>
> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com>
> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com>
> Signed-off-by: Zvi Effron <zeffron@riotgames.com>
> ---
>   include/uapi/linux/bpf.h |  3 --
>   net/bpf/test_run.c       | 77 ++++++++++++++++++++++++++++++++++++----
>   2 files changed, 70 insertions(+), 10 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2c1ba70abbf1..a9dcf3d8c85a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -324,9 +324,6 @@ union bpf_iter_link_info {
>    *		**BPF_PROG_TYPE_SK_LOOKUP**
>    *			*data_in* and *data_out* must be NULL.
>    *
> - *		**BPF_PROG_TYPE_XDP**
> - *			*ctx_in* and *ctx_out* must be NULL.
> - *
>    *		**BPF_PROG_TYPE_RAW_TRACEPOINT**,
>    *		**BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE**
>    *
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index aa47af349ba8..698618f2b27e 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -687,6 +687,38 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>   	return ret;
>   }
>   
> +static int xdp_convert_md_to_buff(struct xdp_buff *xdp, struct xdp_md *xdp_md)

Should the order of parameters be switched to (xdp_md, xdp)?
This will follow the convention of below function xdp_convert_buff_to_md().

> +{
> +	void *data;
> +
> +	if (!xdp_md)
> +		return 0;
> +
> +	if (xdp_md->egress_ifindex != 0)
> +		return -EINVAL;
> +
> +	if (xdp_md->data > xdp_md->data_end)
> +		return -EINVAL;
> +
> +	xdp->data = xdp->data_meta + xdp_md->data;
> +
> +	if (xdp_md->ingress_ifindex != 0 || xdp_md->rx_queue_index != 0)
> +		return -EINVAL;

It would be good if you did all error checking before doing xdp->data
assignment. Also looks like xdp_md error checking happens here and
bpf_prog_test_run_xdp(). If it is hard to put all error checking
in bpf_prog_test_run_xdp(), at least put "xdp_md->data > 
xdp_md->data_end) in bpf_prog_test_run_xdp(), so this function only
checks *_ifindex and rx_queue_index?


> +
> +	return 0;
> +}
> +
> +static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md)
> +{
> +	if (!xdp_md)
> +		return;
> +
> +	/* xdp_md->data_meta must always point to the start of the out buffer */
> +	xdp_md->data_meta = 0;
> +	xdp_md->data = xdp->data - xdp->data_meta;
> +	xdp_md->data_end = xdp->data_end - xdp->data_meta;
> +}
> +
>   int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
>   			  union bpf_attr __user *uattr)
>   {
> @@ -696,36 +728,68 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
>   	u32 repeat = kattr->test.repeat;
>   	struct netdev_rx_queue *rxqueue;
>   	struct xdp_buff xdp = {};
> +	struct xdp_md *ctx;

Let us try to maintain reverse christmas tree?

>   	u32 retval, duration;
>   	u32 max_data_sz;
>   	void *data;
>   	int ret;
>   
> -	if (kattr->test.ctx_in || kattr->test.ctx_out)
> -		return -EINVAL;
> +	ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md));
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	/* There can't be user provided data before the metadata */
> +	if (ctx) {
> +		if (ctx->data_meta)
> +			return -EINVAL;
> +		if (ctx->data_end != size)
> +			return -EINVAL;
> +		if (unlikely((ctx->data & (sizeof(__u32) - 1)) ||
> +			     ctx->data > 32))

Why 32? Should it be sizeof(struct xdp_md)?

> +			return -EINVAL;

As I mentioned in early comments, it would be good if we can
do some or all input parameter validation here.

> +		/* Metadata is allocated from the headroom */
> +		headroom -= ctx->data;

sizeof(struct xdp_md) should be smaller than headroom 
(XDP_PACKET_HEADROOM), so we don't need to a check, but
some comments might be helpful so people looking at the
code doesn't need to double check.

> +	}
>   
>   	/* XDP have extra tailroom as (most) drivers use full page */
>   	max_data_sz = 4096 - headroom - tailroom;
>   
>   	data = bpf_test_init(kattr, max_data_sz, headroom, tailroom);
> -	if (IS_ERR(data))
> +	if (IS_ERR(data)) {
> +		kfree(ctx);
>   		return PTR_ERR(data);
> +	}
>   
>   	rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
>   	xdp_init_buff(&xdp, headroom + max_data_sz + tailroom,
>   		      &rxqueue->xdp_rxq);
>   	xdp_prepare_buff(&xdp, data, headroom, size, true);
>   
> +	ret = xdp_convert_md_to_buff(&xdp, ctx);
> +	if (ret) {
> +		kfree(data);
> +		kfree(ctx);
> +		return ret;
> +	}
> +
>   	bpf_prog_change_xdp(NULL, prog);
>   	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
>   	if (ret)
>   		goto out;
> -	if (xdp.data != data + headroom || xdp.data_end != xdp.data + size)
> -		size = xdp.data_end - xdp.data;
> -	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
> +
> +	if (xdp.data_meta != data + headroom || xdp.data_end != xdp.data_meta + size)
> +		size = xdp.data_end - xdp.data_meta;
> +
> +	xdp_convert_buff_to_md(&xdp, ctx);
> +
> +	ret = bpf_test_finish(kattr, uattr, xdp.data_meta, size, retval, duration);
> +	if (!ret)
> +		ret = bpf_ctx_finish(kattr, uattr, ctx,
> +				     sizeof(struct xdp_md));
>   out:
>   	bpf_prog_change_xdp(prog, NULL);
>   	kfree(data);
> +	kfree(ctx);
>   	return ret;
>   }
>   
> @@ -809,7 +873,6 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>   	if (!ret)
>   		ret = bpf_ctx_finish(kattr, uattr, user_ctx,
>   				     sizeof(struct bpf_flow_keys));
> -
>   out:
>   	kfree(user_ctx);
>   	kfree(data);
>
Martin KaFai Lau June 7, 2021, 5:58 p.m. UTC | #2
On Sat, Jun 05, 2021 at 08:17:00PM -0700, Yonghong Song wrote:
> 
> 
> On 6/4/21 3:02 PM, Zvi Effron wrote:
> > Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for
> > BPF_PROG_TEST_RUN.
> > 
> > The intended use case is to pass some XDP meta data to the test runs of
> > XDP programs that are used as tail calls.
> > 
> > For programs that use bpf_prog_test_run_xdp, support xdp_md input and
> > output. Unlike with an actual xdp_md during a non-test run, data_meta must
> > be 0 because it must point to the start of the provided user data. From
> > the initial xdp_md, use data and data_end to adjust the pointers in the
> > generated xdp_buff. All other non-zero fields are prohibited (with
> > EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially
> > different) xdp_md back to the userspace.
> > 
> > We require all fields of input xdp_md except the ones we explicitly
> > support to be set to zero. The expectation is that in the future we might
> > add support for more fields and we want to fail explicitly if the user
> > runs the program on the kernel where we don't yet support them.
> > 
> > Co-developed-by: Cody Haas <chaas@riotgames.com>
> > Signed-off-by: Cody Haas <chaas@riotgames.com>
> > Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com>
> > Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com>
> > Signed-off-by: Zvi Effron <zeffron@riotgames.com>
> > ---
> >   include/uapi/linux/bpf.h |  3 --
> >   net/bpf/test_run.c       | 77 ++++++++++++++++++++++++++++++++++++----
> >   2 files changed, 70 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 2c1ba70abbf1..a9dcf3d8c85a 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -324,9 +324,6 @@ union bpf_iter_link_info {
> >    *		**BPF_PROG_TYPE_SK_LOOKUP**
> >    *			*data_in* and *data_out* must be NULL.
> >    *
> > - *		**BPF_PROG_TYPE_XDP**
> > - *			*ctx_in* and *ctx_out* must be NULL.
> > - *
> >    *		**BPF_PROG_TYPE_RAW_TRACEPOINT**,
> >    *		**BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE**
> >    *
> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > index aa47af349ba8..698618f2b27e 100644
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -687,6 +687,38 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> >   	return ret;
> >   }
> > +static int xdp_convert_md_to_buff(struct xdp_buff *xdp, struct xdp_md *xdp_md)
> 
> Should the order of parameters be switched to (xdp_md, xdp)?
> This will follow the convention of below function xdp_convert_buff_to_md().
> 
> > +{
> > +	void *data;
> > +
> > +	if (!xdp_md)
> > +		return 0;
> > +
> > +	if (xdp_md->egress_ifindex != 0)
> > +		return -EINVAL;
> > +
> > +	if (xdp_md->data > xdp_md->data_end)
> > +		return -EINVAL;
> > +
> > +	xdp->data = xdp->data_meta + xdp_md->data;
> > +
> > +	if (xdp_md->ingress_ifindex != 0 || xdp_md->rx_queue_index != 0)
> > +		return -EINVAL;
> 
> It would be good if you did all error checking before doing xdp->data
> assignment. Also looks like xdp_md error checking happens here and
> bpf_prog_test_run_xdp(). If it is hard to put all error checking
> in bpf_prog_test_run_xdp(), at least put "xdp_md->data > xdp_md->data_end)
> in bpf_prog_test_run_xdp(),
+1 on at least all data_meta/data/data_end checks should be in one place
in bpf_prog_test_run_xdp().

> so this function only
> checks *_ifindex and rx_queue_index?
> 
> 
> > +
> > +	return 0;
> > +}
> > +
> > +static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md)
> > +{
> > +	if (!xdp_md)
> > +		return;
> > +
> > +	/* xdp_md->data_meta must always point to the start of the out buffer */
> > +	xdp_md->data_meta = 0;
Is this necessary?  data_meta should not have been changed.

> > +	xdp_md->data = xdp->data - xdp->data_meta;
> > +	xdp_md->data_end = xdp->data_end - xdp->data_meta;
> > +}
> > +
Zvi Effron June 9, 2021, 5:06 p.m. UTC | #3
On Sat, Jun 5, 2021 at 10:17 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 6/4/21 3:02 PM, Zvi Effron wrote:
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -687,6 +687,38 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> >       return ret;
> >   }
> >
> > +static int xdp_convert_md_to_buff(struct xdp_buff *xdp, struct xdp_md *xdp_md)
>
> Should the order of parameters be switched to (xdp_md, xdp)?
> This will follow the convention of below function xdp_convert_buff_to_md().
>

The order was done to match the skb versions of these functions, which seem to
have the output format first and the input format second, which is why the
order flips between conversion functions. We're not particular about order, so
we can definitely make it consistent.

> > +{
> > +     void *data;
> > +
> > +     if (!xdp_md)
> > +             return 0;
> > +
> > +     if (xdp_md->egress_ifindex != 0)
> > +             return -EINVAL;
> > +
> > +     if (xdp_md->data > xdp_md->data_end)
> > +             return -EINVAL;
> > +
> > +     xdp->data = xdp->data_meta + xdp_md->data;
> > +
> > +     if (xdp_md->ingress_ifindex != 0 || xdp_md->rx_queue_index != 0)
> > +             return -EINVAL;
>
> It would be good if you did all error checking before doing xdp->data
> assignment. Also looks like xdp_md error checking happens here and
> bpf_prog_test_run_xdp(). If it is hard to put all error checking
> in bpf_prog_test_run_xdp(), at least put "xdp_md->data >
> xdp_md->data_end) in bpf_prog_test_run_xdp(), so this function only
> checks *_ifindex and rx_queue_index?
>

bpf_prog_test_run_xdp() was already a large function, which is why this was
turned into a helper. Initially, we tried to have all xdp_md related logic in
the helper, with only the required logic in bpf_prog_test_run_xdp(). Based on
a prior suggestion, we moved one additional check from the helper to
bpf_prog_test_run_xdp() as it simplified the logic. It's not clear to us what
benefit moving the other checks to bpf_prog_test_run_xdp() provides, but it
does reduce the benefit of having the helper function.

> > @@ -696,36 +728,68 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
> >       u32 repeat = kattr->test.repeat;
> >       struct netdev_rx_queue *rxqueue;
> >       struct xdp_buff xdp = {};
> > +     struct xdp_md *ctx;
>
> Let us try to maintain reverse christmas tree?

Sure.


>
> >       u32 retval, duration;
> >       u32 max_data_sz;
> >       void *data;
> >       int ret;
> >
> > -     if (kattr->test.ctx_in || kattr->test.ctx_out)
> > -             return -EINVAL;
> > +     ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md));
> > +     if (IS_ERR(ctx))
> > +             return PTR_ERR(ctx);
> > +
> > +     /* There can't be user provided data before the metadata */
> > +     if (ctx) {
> > +             if (ctx->data_meta)
> > +                     return -EINVAL;
> > +             if (ctx->data_end != size)
> > +                     return -EINVAL;
> > +             if (unlikely((ctx->data & (sizeof(__u32) - 1)) ||
> > +                          ctx->data > 32))
>
> Why 32? Should it be sizeof(struct xdp_md)?

This is not checking the context itself, but the amount of metadata. XDP allows
at most 32 bytes of metadata.

>
> > +             /* Metadata is allocated from the headroom */
> > +             headroom -= ctx->data;
>
> sizeof(struct xdp_md) should be smaller than headroom
> (XDP_PACKET_HEADROOM), so we don't need to a check, but
> some comments might be helpful so people looking at the
> code doesn't need to double check.

We're not sure what check you're referring to, as there's no check here. This
subtraction is, as the comment says, because the XDP metadata is allocated out
of the XDP headroom, so the headroom size needs to be reduced by the metadata
size.
Yonghong Song June 10, 2021, 12:07 a.m. UTC | #4
On 6/9/21 10:06 AM, Zvi Effron wrote:
> On Sat, Jun 5, 2021 at 10:17 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 6/4/21 3:02 PM, Zvi Effron wrote:
>>> --- a/net/bpf/test_run.c
>>> +++ b/net/bpf/test_run.c
>>> @@ -687,6 +687,38 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>>>        return ret;
>>>    }
>>>
>>> +static int xdp_convert_md_to_buff(struct xdp_buff *xdp, struct xdp_md *xdp_md)
>>
>> Should the order of parameters be switched to (xdp_md, xdp)?
>> This will follow the convention of below function xdp_convert_buff_to_md().
>>
> 
> The order was done to match the skb versions of these functions, which seem to
> have the output format first and the input format second, which is why the
> order flips between conversion functions. We're not particular about order, so
> we can definitely make it consistent.

But for another function we have

+static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md 
*xdp_md)

The input first and the output second. In my opinion, in the same file,
we should keep the same ordering convention.

> 
>>> +{
>>> +     void *data;
>>> +
>>> +     if (!xdp_md)
>>> +             return 0;
>>> +
>>> +     if (xdp_md->egress_ifindex != 0)
>>> +             return -EINVAL;
>>> +
>>> +     if (xdp_md->data > xdp_md->data_end)
>>> +             return -EINVAL;
>>> +
>>> +     xdp->data = xdp->data_meta + xdp_md->data;
>>> +
>>> +     if (xdp_md->ingress_ifindex != 0 || xdp_md->rx_queue_index != 0)
>>> +             return -EINVAL;
>>
>> It would be good if you did all error checking before doing xdp->data
>> assignment. Also looks like xdp_md error checking happens here and
>> bpf_prog_test_run_xdp(). If it is hard to put all error checking
>> in bpf_prog_test_run_xdp(), at least put "xdp_md->data >
>> xdp_md->data_end) in bpf_prog_test_run_xdp(), so this function only
>> checks *_ifindex and rx_queue_index?
>>
> 
> bpf_prog_test_run_xdp() was already a large function, which is why this was
> turned into a helper. Initially, we tried to have all xdp_md related logic in
> the helper, with only the required logic in bpf_prog_test_run_xdp(). Based on
> a prior suggestion, we moved one additional check from the helper to
> bpf_prog_test_run_xdp() as it simplified the logic. It's not clear to us what
> benefit moving the other checks to bpf_prog_test_run_xdp() provides, but it
> does reduce the benefit of having the helper function.

At least put "if (xdp_md->data > xdp_md->data_end)" checking in the 
bpf_prog_test_run_xdp() as similar fields are already checked there.

It is okay to put *_ifindex/rx_queue_index in this function since you 
need to get device for checking and there is no need to get device twice.

> 
>>> @@ -696,36 +728,68 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
>>>        u32 repeat = kattr->test.repeat;
>>>        struct netdev_rx_queue *rxqueue;
>>>        struct xdp_buff xdp = {};
>>> +     struct xdp_md *ctx;
>>
>> Let us try to maintain reverse christmas tree?
> 
> Sure.
> 
> 
>>
>>>        u32 retval, duration;
>>>        u32 max_data_sz;
>>>        void *data;
>>>        int ret;
>>>
>>> -     if (kattr->test.ctx_in || kattr->test.ctx_out)
>>> -             return -EINVAL;
>>> +     ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md));
>>> +     if (IS_ERR(ctx))
>>> +             return PTR_ERR(ctx);
>>> +
>>> +     /* There can't be user provided data before the metadata */
>>> +     if (ctx) {
>>> +             if (ctx->data_meta)
>>> +                     return -EINVAL;
>>> +             if (ctx->data_end != size)
>>> +                     return -EINVAL;
>>> +             if (unlikely((ctx->data & (sizeof(__u32) - 1)) ||
>>> +                          ctx->data > 32))
>>
>> Why 32? Should it be sizeof(struct xdp_md)?
> 
> This is not checking the context itself, but the amount of metadata. XDP allows
> at most 32 bytes of metadata.

Do we have a macro for this "32"? It would be good if we have one.
Otherwise, some comments will be good.

Previously I am thinking just enforce ctx->data to be sizeof(struct 
xdp_md). But think twice, this is a little bit too restricted.
So your current handling is fine.

> 
>>
>>> +             /* Metadata is allocated from the headroom */
>>> +             headroom -= ctx->data;
>>
>> sizeof(struct xdp_md) should be smaller than headroom
>> (XDP_PACKET_HEADROOM), so we don't need to a check, but
>> some comments might be helpful so people looking at the
>> code doesn't need to double check.
> 
> We're not sure what check you're referring to, as there's no check here. This
> subtraction is, as the comment says, because the XDP metadata is allocated out
> of the XDP headroom, so the headroom size needs to be reduced by the metadata
> size.

I am wondering whether we need to check
   if (headroom  < ctx->data)
      return -EINVAL;
   headroom -= ctx->data;
We have
   headroom = XDP_PACKET_HEADROOM;
   ctx->data <= 32
so we should be okay.
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2c1ba70abbf1..a9dcf3d8c85a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -324,9 +324,6 @@  union bpf_iter_link_info {
  *		**BPF_PROG_TYPE_SK_LOOKUP**
  *			*data_in* and *data_out* must be NULL.
  *
- *		**BPF_PROG_TYPE_XDP**
- *			*ctx_in* and *ctx_out* must be NULL.
- *
  *		**BPF_PROG_TYPE_RAW_TRACEPOINT**,
  *		**BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE**
  *
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index aa47af349ba8..698618f2b27e 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -687,6 +687,38 @@  int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	return ret;
 }
 
+static int xdp_convert_md_to_buff(struct xdp_buff *xdp, struct xdp_md *xdp_md)
+{
+	void *data;
+
+	if (!xdp_md)
+		return 0;
+
+	if (xdp_md->egress_ifindex != 0)
+		return -EINVAL;
+
+	if (xdp_md->data > xdp_md->data_end)
+		return -EINVAL;
+
+	xdp->data = xdp->data_meta + xdp_md->data;
+
+	if (xdp_md->ingress_ifindex != 0 || xdp_md->rx_queue_index != 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md)
+{
+	if (!xdp_md)
+		return;
+
+	/* xdp_md->data_meta must always point to the start of the out buffer */
+	xdp_md->data_meta = 0;
+	xdp_md->data = xdp->data - xdp->data_meta;
+	xdp_md->data_end = xdp->data_end - xdp->data_meta;
+}
+
 int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
@@ -696,36 +728,68 @@  int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	u32 repeat = kattr->test.repeat;
 	struct netdev_rx_queue *rxqueue;
 	struct xdp_buff xdp = {};
+	struct xdp_md *ctx;
 	u32 retval, duration;
 	u32 max_data_sz;
 	void *data;
 	int ret;
 
-	if (kattr->test.ctx_in || kattr->test.ctx_out)
-		return -EINVAL;
+	ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md));
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	/* There can't be user provided data before the metadata */
+	if (ctx) {
+		if (ctx->data_meta)
+			return -EINVAL;
+		if (ctx->data_end != size)
+			return -EINVAL;
+		if (unlikely((ctx->data & (sizeof(__u32) - 1)) ||
+			     ctx->data > 32))
+			return -EINVAL;
+		/* Metadata is allocated from the headroom */
+		headroom -= ctx->data;
+	}
 
 	/* XDP have extra tailroom as (most) drivers use full page */
 	max_data_sz = 4096 - headroom - tailroom;
 
 	data = bpf_test_init(kattr, max_data_sz, headroom, tailroom);
-	if (IS_ERR(data))
+	if (IS_ERR(data)) {
+		kfree(ctx);
 		return PTR_ERR(data);
+	}
 
 	rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
 	xdp_init_buff(&xdp, headroom + max_data_sz + tailroom,
 		      &rxqueue->xdp_rxq);
 	xdp_prepare_buff(&xdp, data, headroom, size, true);
 
+	ret = xdp_convert_md_to_buff(&xdp, ctx);
+	if (ret) {
+		kfree(data);
+		kfree(ctx);
+		return ret;
+	}
+
 	bpf_prog_change_xdp(NULL, prog);
 	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
 	if (ret)
 		goto out;
-	if (xdp.data != data + headroom || xdp.data_end != xdp.data + size)
-		size = xdp.data_end - xdp.data;
-	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
+
+	if (xdp.data_meta != data + headroom || xdp.data_end != xdp.data_meta + size)
+		size = xdp.data_end - xdp.data_meta;
+
+	xdp_convert_buff_to_md(&xdp, ctx);
+
+	ret = bpf_test_finish(kattr, uattr, xdp.data_meta, size, retval, duration);
+	if (!ret)
+		ret = bpf_ctx_finish(kattr, uattr, ctx,
+				     sizeof(struct xdp_md));
 out:
 	bpf_prog_change_xdp(prog, NULL);
 	kfree(data);
+	kfree(ctx);
 	return ret;
 }
 
@@ -809,7 +873,6 @@  int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, user_ctx,
 				     sizeof(struct bpf_flow_keys));
-
 out:
 	kfree(user_ctx);
 	kfree(data);