diff mbox series

[net-next] memory-provider: fix compilation issue without SYSFS

Message ID 20240912-net-next-fix-get_netdev_rx_queue_index-v1-1-d73a1436be8c@kernel.org (mailing list archive)
State Accepted, archived
Commit 01787be49b25bdc3d64d491a194147f6a5ea46a9
Delegated to: Matthieu Baerts
Headers show
Series [net-next] memory-provider: fix compilation issue without SYSFS | expand

Checks

Context Check Description
matttbe/build success Build and static analysis OK
matttbe/checkpatch warning total: 0 errors, 1 warnings, 0 checks, 16 lines checked
matttbe/shellcheck success MPTCP selftests files have not been modified
matttbe/KVM_Validation__normal success Success! ✅
matttbe/KVM_Validation__debug success Success! ✅
matttbe/KVM_Validation__btf__only_bpftest_all_ success Success! ✅

Commit Message

Matthieu Baerts (NGI0) Sept. 12, 2024, 10:25 a.m. UTC
When CONFIG_SYSFS is not set, the kernel fails to compile:

     net/core/page_pool_user.c:368:45: error: implicit declaration of function 'get_netdev_rx_queue_index' [-Werror=implicit-function-declaration]
      368 |                 if (pool->slow.queue_idx == get_netdev_rx_queue_index(rxq)) {
          |                                             ^~~~~~~~~~~~~~~~~~~~~~~~~

When CONFIG_SYSFS is not set, get_netdev_rx_queue_index() is not defined
as well. In this case, page_pool_check_memory_provider() cannot check
the memory provider, and a success answer can be returned instead.

Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 net/core/page_pool_user.c | 4 ++++
 1 file changed, 4 insertions(+)


---
base-commit: 3cfb5aa10cb78571e214e48a3a6e42c11d5288a1
change-id: 20240912-net-next-fix-get_netdev_rx_queue_index-a1034d1b962a

Best regards,

Comments

MPTCP CI Sept. 12, 2024, 11:20 a.m. UTC | #1
Hi Matthieu,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/10829149648

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/b906440906a3
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=889709


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Mina Almasry Sept. 12, 2024, 12:49 p.m. UTC | #2
On Thu, Sep 12, 2024 at 3:25 AM Matthieu Baerts (NGI0)
<matttbe@kernel.org> wrote:
>
> When CONFIG_SYSFS is not set, the kernel fails to compile:
>
>      net/core/page_pool_user.c:368:45: error: implicit declaration of function 'get_netdev_rx_queue_index' [-Werror=implicit-function-declaration]
>       368 |                 if (pool->slow.queue_idx == get_netdev_rx_queue_index(rxq)) {
>           |                                             ^~~~~~~~~~~~~~~~~~~~~~~~~
>
> When CONFIG_SYSFS is not set, get_netdev_rx_queue_index() is not defined
> as well. In this case, page_pool_check_memory_provider() cannot check
> the memory provider, and a success answer can be returned instead.
>

Thanks Matthieu, and sorry about that.

I have reproduced the build error and the fix resolves it. But...

> Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider")
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
>  net/core/page_pool_user.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> index 48335766c1bf..a98c0a76b33f 100644
> --- a/net/core/page_pool_user.c
> +++ b/net/core/page_pool_user.c
> @@ -353,6 +353,7 @@ void page_pool_unlist(struct page_pool *pool)
>  int page_pool_check_memory_provider(struct net_device *dev,
>                                     struct netdev_rx_queue *rxq)
>  {
> +#ifdef CONFIG_SYSFS
>         struct net_devmem_dmabuf_binding *binding = rxq->mp_params.mp_priv;
>         struct page_pool *pool;
>         struct hlist_node *n;
> @@ -372,6 +373,9 @@ int page_pool_check_memory_provider(struct net_device *dev,
>         }
>         mutex_unlock(&page_pools_lock);
>         return -ENODATA;
> +#else
> +       return 0;

...we can't assume success when we cannot check the memory provider.
The memory provider check is somewhat critical; we rely on it to
detect that the driver does not support memory providers or is not
doing the right thing, and report that to the user. I don't think we
can silently disable the check when the CONFIG_SYSFS is disabled.
Please return -ENODATA or some other error here.

If we disable devmem TCP for !CONFIG_SYSFS we should probably add
something to the docs saying this. I can do that in a follow up
change.

However, I'm looking at the definition of get_netdev_rx_queue_index()
and at first glance I don't see anything there that is actually
dependent on CONFIG_SYSFS. Can we do this instead? I have build-tested
it and it resolves the build issue as well:

```
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index ac34f5fb4f71..596836abf7bf 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -45,7 +45,6 @@ __netif_get_rx_queue(struct net_device *dev, unsigned int rxq)
        return dev->_rx + rxq;
 }

-#ifdef CONFIG_SYSFS
 static inline unsigned int
 get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
 {
@@ -55,7 +54,6 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
        BUG_ON(index >= dev->num_rx_queues);
        return index;
 }
-#endif
 ```

Matthieu, I'm happy to follow up with v2 of this fix if you don't have time.
Matthieu Baerts (NGI0) Sept. 12, 2024, 3:26 p.m. UTC | #3
Hi Mina,

Thank you for your reply!

On 12/09/2024 14:49, Mina Almasry wrote:
> On Thu, Sep 12, 2024 at 3:25 AM Matthieu Baerts (NGI0)
> <matttbe@kernel.org> wrote:
>>
>> When CONFIG_SYSFS is not set, the kernel fails to compile:
>>
>>      net/core/page_pool_user.c:368:45: error: implicit declaration of function 'get_netdev_rx_queue_index' [-Werror=implicit-function-declaration]
>>       368 |                 if (pool->slow.queue_idx == get_netdev_rx_queue_index(rxq)) {
>>           |                                             ^~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> When CONFIG_SYSFS is not set, get_netdev_rx_queue_index() is not defined
>> as well. In this case, page_pool_check_memory_provider() cannot check
>> the memory provider, and a success answer can be returned instead.
>>
> 
> Thanks Matthieu, and sorry about that.
> 
> I have reproduced the build error and the fix resolves it. But...
> 
>> Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider")
>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> ---
>>  net/core/page_pool_user.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
>> index 48335766c1bf..a98c0a76b33f 100644
>> --- a/net/core/page_pool_user.c
>> +++ b/net/core/page_pool_user.c
>> @@ -353,6 +353,7 @@ void page_pool_unlist(struct page_pool *pool)
>>  int page_pool_check_memory_provider(struct net_device *dev,
>>                                     struct netdev_rx_queue *rxq)
>>  {
>> +#ifdef CONFIG_SYSFS
>>         struct net_devmem_dmabuf_binding *binding = rxq->mp_params.mp_priv;
>>         struct page_pool *pool;
>>         struct hlist_node *n;
>> @@ -372,6 +373,9 @@ int page_pool_check_memory_provider(struct net_device *dev,
>>         }
>>         mutex_unlock(&page_pools_lock);
>>         return -ENODATA;
>> +#else
>> +       return 0;
> 
> ...we can't assume success when we cannot check the memory provider.
> The memory provider check is somewhat critical; we rely on it to
> detect that the driver does not support memory providers or is not
> doing the right thing, and report that to the user. I don't think we
> can silently disable the check when the CONFIG_SYSFS is disabled.
> Please return -ENODATA or some other error here.

I initially returned 0 to have the same behaviour as when
CONFIG_PAGE_POOL is not defined. But thanks to your explanations, I
understand it seems better to return -ENODATA here. Or another errno, to
let the userspace understanding there is a different error? I can send a
v2 after the 24h rate-limit period if you are OK with that.

> If we disable devmem TCP for !CONFIG_SYSFS we should probably add
> something to the docs saying this. I can do that in a follow up
> change.

Good point, thank you.

> However, I'm looking at the definition of get_netdev_rx_queue_index()
> and at first glance I don't see anything there that is actually
> dependent on CONFIG_SYSFS. Can we do this instead? I have build-tested
> it and it resolves the build issue as well:
> 
> ```
> diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
> index ac34f5fb4f71..596836abf7bf 100644
> --- a/include/net/netdev_rx_queue.h
> +++ b/include/net/netdev_rx_queue.h
> @@ -45,7 +45,6 @@ __netif_get_rx_queue(struct net_device *dev, unsigned int rxq)
>         return dev->_rx + rxq;
>  }
> 
> -#ifdef CONFIG_SYSFS
>  static inline unsigned int
>  get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
>  {
> @@ -55,7 +54,6 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
>         BUG_ON(index >= dev->num_rx_queues);
>         return index;
>  }
> -#endif
>  ```

I briefly looked at taking this path when I saw what this helper was
doing, but then I saw all operations related to the received queues were
enabled only when CONFIG_SYSFS is set, see commit a953be53ce40
("net-sysfs: add support for device-specific rx queue sysfs
attributes"). I understood from that it is better not to look at
dev->_rx or dev->num_rx_queues when CONFIG_SYSFS is not set. I'm not
very familiar to that part of the code, but it feels like removing this
#ifdef might be similar to the "return 0" I suggested: silently
disabling the check, no?

I *think* it might be clearer to return an error when SYSFS is not set.

> Matthieu, I'm happy to follow up with v2 of this fix if you don't have time.
If you prefer to explore other ways than returning an error in
page_pool_check_memory_provider() when SYSFS is not set, yes please do
the follow-up if you don't mind. My main goal is to stop my CI to
complain about that when compiling with 'make tinyconfig' + MPTCP :)

Cheers,
Matt
Mina Almasry Sept. 12, 2024, 6:21 p.m. UTC | #4
On Thu, Sep 12, 2024 at 8:26 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>
> Hi Mina,
>
> Thank you for your reply!
>
> On 12/09/2024 14:49, Mina Almasry wrote:
> > On Thu, Sep 12, 2024 at 3:25 AM Matthieu Baerts (NGI0)
> > <matttbe@kernel.org> wrote:
> >>
> >> When CONFIG_SYSFS is not set, the kernel fails to compile:
> >>
> >>      net/core/page_pool_user.c:368:45: error: implicit declaration of function 'get_netdev_rx_queue_index' [-Werror=implicit-function-declaration]
> >>       368 |                 if (pool->slow.queue_idx == get_netdev_rx_queue_index(rxq)) {
> >>           |                                             ^~~~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> When CONFIG_SYSFS is not set, get_netdev_rx_queue_index() is not defined
> >> as well. In this case, page_pool_check_memory_provider() cannot check
> >> the memory provider, and a success answer can be returned instead.
> >>
> >
> > Thanks Matthieu, and sorry about that.
> >
> > I have reproduced the build error and the fix resolves it. But...
> >
> >> Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider")
> >> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> >> ---
> >>  net/core/page_pool_user.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> >> index 48335766c1bf..a98c0a76b33f 100644
> >> --- a/net/core/page_pool_user.c
> >> +++ b/net/core/page_pool_user.c
> >> @@ -353,6 +353,7 @@ void page_pool_unlist(struct page_pool *pool)
> >>  int page_pool_check_memory_provider(struct net_device *dev,
> >>                                     struct netdev_rx_queue *rxq)
> >>  {
> >> +#ifdef CONFIG_SYSFS
> >>         struct net_devmem_dmabuf_binding *binding = rxq->mp_params.mp_priv;
> >>         struct page_pool *pool;
> >>         struct hlist_node *n;
> >> @@ -372,6 +373,9 @@ int page_pool_check_memory_provider(struct net_device *dev,
> >>         }
> >>         mutex_unlock(&page_pools_lock);
> >>         return -ENODATA;
> >> +#else
> >> +       return 0;
> >
> > ...we can't assume success when we cannot check the memory provider.
> > The memory provider check is somewhat critical; we rely on it to
> > detect that the driver does not support memory providers or is not
> > doing the right thing, and report that to the user. I don't think we
> > can silently disable the check when the CONFIG_SYSFS is disabled.
> > Please return -ENODATA or some other error here.
>
> I initially returned 0 to have the same behaviour as when
> CONFIG_PAGE_POOL is not defined. But thanks to your explanations, I
> understand it seems better to return -ENODATA here. Or another errno, to
> let the userspace understanding there is a different error? I can send a
> v2 after the 24h rate-limit period if you are OK with that.
>

Yes, -EOPNOTSUPP would be my preference here. I think it makes sense,
we should not support memory-providers on configs where core can't
verify that the driver did the right thing.

[...]

> > However, I'm looking at the definition of get_netdev_rx_queue_index()
> > and at first glance I don't see anything there that is actually
> > dependent on CONFIG_SYSFS. Can we do this instead? I have build-tested
> > it and it resolves the build issue as well:
> >
> > ```
> > diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
> > index ac34f5fb4f71..596836abf7bf 100644
> > --- a/include/net/netdev_rx_queue.h
> > +++ b/include/net/netdev_rx_queue.h
> > @@ -45,7 +45,6 @@ __netif_get_rx_queue(struct net_device *dev, unsigned int rxq)
> >         return dev->_rx + rxq;
> >  }
> >
> > -#ifdef CONFIG_SYSFS
> >  static inline unsigned int
> >  get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
> >  {
> > @@ -55,7 +54,6 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
> >         BUG_ON(index >= dev->num_rx_queues);
> >         return index;
> >  }
> > -#endif
> >  ```
>
> I briefly looked at taking this path when I saw what this helper was
> doing, but then I saw all operations related to the received queues were
> enabled only when CONFIG_SYSFS is set, see commit a953be53ce40
> ("net-sysfs: add support for device-specific rx queue sysfs
> attributes"). I understood from that it is better not to look at
> dev->_rx or dev->num_rx_queues when CONFIG_SYSFS is not set. I'm not
> very familiar to that part of the code, but it feels like removing this
> #ifdef might be similar to the "return 0" I suggested: silently
> disabling the check, no?
>
> I *think* it might be clearer to return an error when SYSFS is not set.
>

FWIW it looks like commit e817f85652c1 ("xdp: generic XDP handling of
xdp_rxq_info") reverted almost all the CONFIG_SYSFS checks set by
commit a953be53ce40 ("net-sysfs: add support for device-specific rx
queue sysfs attributes"), at least from a quick look.

But I understand your CI is probably very annoyed by the build
failure. I would be happy to reviewed-by a patch with just the change
to the error return value, and I can look into making this work with
CONFIG_SYSFS after the merge window.
Jakub Kicinski Sept. 13, 2024, 3:10 a.m. UTC | #5
On Thu, 12 Sep 2024 11:21:23 -0700 Mina Almasry wrote:
> > I briefly looked at taking this path when I saw what this helper was
> > doing, but then I saw all operations related to the received queues were
> > enabled only when CONFIG_SYSFS is set, see commit a953be53ce40
> > ("net-sysfs: add support for device-specific rx queue sysfs
> > attributes"). I understood from that it is better not to look at
> > dev->_rx or dev->num_rx_queues when CONFIG_SYSFS is not set. I'm not
> > very familiar to that part of the code, but it feels like removing this
> > #ifdef might be similar to the "return 0" I suggested: silently
> > disabling the check, no?
> >
> > I *think* it might be clearer to return an error when SYSFS is not set.
> >  
> 
> FWIW it looks like commit e817f85652c1 ("xdp: generic XDP handling of
> xdp_rxq_info") reverted almost all the CONFIG_SYSFS checks set by
> commit a953be53ce40 ("net-sysfs: add support for device-specific rx
> queue sysfs attributes"), at least from a quick look.

That's right, just delete the ifdef. I should have done that when
I moved the helper. Please send the fix ASAP.
diff mbox series

Patch

diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 48335766c1bf..a98c0a76b33f 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -353,6 +353,7 @@  void page_pool_unlist(struct page_pool *pool)
 int page_pool_check_memory_provider(struct net_device *dev,
 				    struct netdev_rx_queue *rxq)
 {
+#ifdef CONFIG_SYSFS
 	struct net_devmem_dmabuf_binding *binding = rxq->mp_params.mp_priv;
 	struct page_pool *pool;
 	struct hlist_node *n;
@@ -372,6 +373,9 @@  int page_pool_check_memory_provider(struct net_device *dev,
 	}
 	mutex_unlock(&page_pools_lock);
 	return -ENODATA;
+#else
+	return 0;
+#endif
 }
 
 static void page_pool_unreg_netdev_wipe(struct net_device *netdev)