diff mbox series

[net-next] net-sysfs: remove possible sleep from an RCU read-side critical section

Message ID 20210322154329.340048-1-atenart@kernel.org (mailing list archive)
State Accepted
Commit 7f08ec6e04269ce53b664761c9108b44ed2f54ab
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net-sysfs: remove possible sleep from an RCU read-side critical section | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 3 maintainers not CCed: alexanderduyck@fb.com edumazet@google.com weiwan@google.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/header_inline success Link

Commit Message

Antoine Tenart March 22, 2021, 3:43 p.m. UTC
xps_queue_show is mostly made of an RCU read-side critical section and
calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not
allowed as this call may sleep and such behaviours aren't allowed in RCU
read-side critical sections. Fix this by using GFP_NOWAIT instead.

Fixes: 5478fcd0f483 ("net: embed nr_ids in the xps maps")
Reported-by: kernel test robot <oliver.sang@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---

Fix sent to net-next as it fixes an issue only in net-next.

 net/core/net-sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Matthew Wilcox (Oracle) March 22, 2021, 4:54 p.m. UTC | #1
On Mon, Mar 22, 2021 at 04:43:29PM +0100, Antoine Tenart wrote:
> xps_queue_show is mostly made of an RCU read-side critical section and
> calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not
> allowed as this call may sleep and such behaviours aren't allowed in RCU
> read-side critical sections. Fix this by using GFP_NOWAIT instead.

This would be another way of fixing the problem that is slightly less
complex than my initial proposal, but does allow for using GFP_KERNEL
for fewer failures:

@@ -1366,11 +1366,10 @@ static ssize_t xps_queue_show(struct net_device *dev, unsigned int index,
 {
        struct xps_dev_maps *dev_maps;
        unsigned long *mask;
-       unsigned int nr_ids;
+       unsigned int nr_ids, new_nr_ids;
        int j, len;
 
-       rcu_read_lock();
-       dev_maps = rcu_dereference(dev->xps_maps[type]);
+       dev_maps = READ_ONCE(dev->xps_maps[type]);
 
        /* Default to nr_cpu_ids/dev->num_rx_queues and do not just return 0
         * when dev_maps hasn't been allocated yet, to be backward compatible.
@@ -1379,10 +1378,18 @@ static ssize_t xps_queue_show(struct net_device *dev, unsigned int index,
                 (type == XPS_CPUS ? nr_cpu_ids : dev->num_rx_queues);
 
        mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
-       if (!mask) {
-               rcu_read_unlock();
+       if (!mask)
                return -ENOMEM;
-       }
+
+       rcu_read_lock();
+       dev_maps = rcu_dereference(dev->xps_maps[type]);
+       /* if nr_ids shrank in the meantime, do not overrun array.
+        * if it increased, we just won't show the new ones
+        */
+       new_nr_ids = dev_maps ? dev_maps->nr_ids :
+                       (type == XPS_CPUS ? nr_cpu_ids : dev->num_rx_queues);
+       if (new_nr_ids < nr_ids)
+               nr_ids = new_nr_ids;
 
        if (!dev_maps || tc >= dev_maps->num_tc)
                goto out_no_maps;

(or do we need the rcu read lock to read dev->num_rcx_queues? i'm assuming
we only need it to read the xps_maps array)
Antoine Tenart March 22, 2021, 5:41 p.m. UTC | #2
Quoting Matthew Wilcox (2021-03-22 17:54:39)
> On Mon, Mar 22, 2021 at 04:43:29PM +0100, Antoine Tenart wrote:
> > xps_queue_show is mostly made of an RCU read-side critical section and
> > calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not
> > allowed as this call may sleep and such behaviours aren't allowed in RCU
> > read-side critical sections. Fix this by using GFP_NOWAIT instead.
> 
> This would be another way of fixing the problem that is slightly less
> complex than my initial proposal, but does allow for using GFP_KERNEL
> for fewer failures:
> 
> @@ -1366,11 +1366,10 @@ static ssize_t xps_queue_show(struct net_device *dev, unsigned int index,
>  {
>         struct xps_dev_maps *dev_maps;
>         unsigned long *mask;
> -       unsigned int nr_ids;
> +       unsigned int nr_ids, new_nr_ids;
>         int j, len;
>  
> -       rcu_read_lock();
> -       dev_maps = rcu_dereference(dev->xps_maps[type]);
> +       dev_maps = READ_ONCE(dev->xps_maps[type]);

Couldn't dev_maps be freed between here and the read of dev_maps->nr_ids
as we're not in an RCU read-side critical section?

>         /* Default to nr_cpu_ids/dev->num_rx_queues and do not just return 0
>          * when dev_maps hasn't been allocated yet, to be backward compatible.
> @@ -1379,10 +1378,18 @@ static ssize_t xps_queue_show(struct net_device *dev, unsigned int index,
>                  (type == XPS_CPUS ? nr_cpu_ids : dev->num_rx_queues);
>  
>         mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
> -       if (!mask) {
> -               rcu_read_unlock();
> +       if (!mask)
>                 return -ENOMEM;
> -       }
> +
> +       rcu_read_lock();
> +       dev_maps = rcu_dereference(dev->xps_maps[type]);
> +       /* if nr_ids shrank in the meantime, do not overrun array.
> +        * if it increased, we just won't show the new ones
> +        */
> +       new_nr_ids = dev_maps ? dev_maps->nr_ids :
> +                       (type == XPS_CPUS ? nr_cpu_ids : dev->num_rx_queues);
> +       if (new_nr_ids < nr_ids)
> +               nr_ids = new_nr_ids;
>  
>         if (!dev_maps || tc >= dev_maps->num_tc)
>                 goto out_no_maps;

My feeling is there is not much value in having a tricky allocation
logic for reads from xps_cpus and xps_rxqs. While we could come up with
something, returning -ENOMEM on memory pressure should be fine.

Antoine
Matthew Wilcox (Oracle) March 22, 2021, 5:44 p.m. UTC | #3
On Mon, Mar 22, 2021 at 06:41:30PM +0100, Antoine Tenart wrote:
> Quoting Matthew Wilcox (2021-03-22 17:54:39)
> > -       rcu_read_lock();
> > -       dev_maps = rcu_dereference(dev->xps_maps[type]);
> > +       dev_maps = READ_ONCE(dev->xps_maps[type]);
> 
> Couldn't dev_maps be freed between here and the read of dev_maps->nr_ids
> as we're not in an RCU read-side critical section?

Oh, good point.  Never mind, then.

> My feeling is there is not much value in having a tricky allocation
> logic for reads from xps_cpus and xps_rxqs. While we could come up with
> something, returning -ENOMEM on memory pressure should be fine.

That's fine.  It's your code, and this is probably a small allocation
anyway.
Antoine Tenart March 22, 2021, 5:45 p.m. UTC | #4
Quoting Antoine Tenart (2021-03-22 18:41:30)
> Quoting Matthew Wilcox (2021-03-22 17:54:39)
> > On Mon, Mar 22, 2021 at 04:43:29PM +0100, Antoine Tenart wrote:
> > > xps_queue_show is mostly made of an RCU read-side critical section and
> > > calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not
> > > allowed as this call may sleep and such behaviours aren't allowed in RCU
> > > read-side critical sections. Fix this by using GFP_NOWAIT instead.
> > 
> > This would be another way of fixing the problem that is slightly less
> > complex than my initial proposal, but does allow for using GFP_KERNEL
> > for fewer failures:
> > 
> > @@ -1366,11 +1366,10 @@ static ssize_t xps_queue_show(struct net_device *dev, unsigned int index,
> >  {
> >         struct xps_dev_maps *dev_maps;
> >         unsigned long *mask;
> > -       unsigned int nr_ids;
> > +       unsigned int nr_ids, new_nr_ids;
> >         int j, len;
> >  
> > -       rcu_read_lock();
> > -       dev_maps = rcu_dereference(dev->xps_maps[type]);
> > +       dev_maps = READ_ONCE(dev->xps_maps[type]);
> 
> Couldn't dev_maps be freed between here and the read of dev_maps->nr_ids
> as we're not in an RCU read-side critical section?

* The first read of dev_maps->nr_ids, happening before rcu_read_lock,
  not the one shown below.

> >         /* Default to nr_cpu_ids/dev->num_rx_queues and do not just return 0
> >          * when dev_maps hasn't been allocated yet, to be backward compatible.
> > @@ -1379,10 +1378,18 @@ static ssize_t xps_queue_show(struct net_device *dev, unsigned int index,
> >                  (type == XPS_CPUS ? nr_cpu_ids : dev->num_rx_queues);
> >  
> >         mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
> > -       if (!mask) {
> > -               rcu_read_unlock();
> > +       if (!mask)
> >                 return -ENOMEM;
> > -       }
> > +
> > +       rcu_read_lock();
> > +       dev_maps = rcu_dereference(dev->xps_maps[type]);
> > +       /* if nr_ids shrank in the meantime, do not overrun array.
> > +        * if it increased, we just won't show the new ones
> > +        */
> > +       new_nr_ids = dev_maps ? dev_maps->nr_ids :
> > +                       (type == XPS_CPUS ? nr_cpu_ids : dev->num_rx_queues);
> > +       if (new_nr_ids < nr_ids)
> > +               nr_ids = new_nr_ids;
Antoine Tenart March 22, 2021, 5:49 p.m. UTC | #5
Quoting Matthew Wilcox (2021-03-22 18:44:21)
> On Mon, Mar 22, 2021 at 06:41:30PM +0100, Antoine Tenart wrote:
> > Quoting Matthew Wilcox (2021-03-22 17:54:39)
> > > -       rcu_read_lock();
> > > -       dev_maps = rcu_dereference(dev->xps_maps[type]);
> > > +       dev_maps = READ_ONCE(dev->xps_maps[type]);
> > 
> > Couldn't dev_maps be freed between here and the read of dev_maps->nr_ids
> > as we're not in an RCU read-side critical section?
> 
> Oh, good point.  Never mind, then.
> 
> > My feeling is there is not much value in having a tricky allocation
> > logic for reads from xps_cpus and xps_rxqs. While we could come up with
> > something, returning -ENOMEM on memory pressure should be fine.
> 
> That's fine.  It's your code, and this is probably a small allocation
> anyway.

All right. Thanks for the suggestions anyway!

Antoine
patchwork-bot+netdevbpf@kernel.org March 22, 2021, 8:40 p.m. UTC | #6
Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Mon, 22 Mar 2021 16:43:29 +0100 you wrote:
> xps_queue_show is mostly made of an RCU read-side critical section and
> calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not
> allowed as this call may sleep and such behaviours aren't allowed in RCU
> read-side critical sections. Fix this by using GFP_NOWAIT instead.
> 
> Fixes: 5478fcd0f483 ("net: embed nr_ids in the xps maps")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Antoine Tenart <atenart@kernel.org>
> 
> [...]

Here is the summary with links:
  - [net-next] net-sysfs: remove possible sleep from an RCU read-side critical section
    https://git.kernel.org/netdev/net-next/c/7f08ec6e0426

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
diff mbox series

Patch

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 562a42fcd437..f6197774048b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1378,7 +1378,7 @@  static ssize_t xps_queue_show(struct net_device *dev, unsigned int index,
 	nr_ids = dev_maps ? dev_maps->nr_ids :
 		 (type == XPS_CPUS ? nr_cpu_ids : dev->num_rx_queues);
 
-	mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
+	mask = bitmap_zalloc(nr_ids, GFP_NOWAIT);
 	if (!mask) {
 		rcu_read_unlock();
 		return -ENOMEM;