[-,resend] VFS: use synchronize_rcu_expedited() in namespace_unlock()
diff mbox series

Message ID 87tvm1rxme.fsf@notabene.neil.brown.name
State New
Headers show
Series
  • [-,resend] VFS: use synchronize_rcu_expedited() in namespace_unlock()
Related show

Commit Message

NeilBrown Oct. 5, 2018, 1:27 a.m. UTC
The synchronize_rcu() in namespace_unlock() is called every time
a filesystem is unmounted.  If a great many filesystems are mounted,
this can cause a noticable slow-down in, for example, system shutdown.

The sequence:
  mkdir -p /tmp/Mtest/{0..5000}
  time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
  time umount /tmp/Mtest/*

on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
100 seconds to unmount them.

Boot the same VM with 1 CPU and it takes 18 seconds to mount the
tmpfs filesystems, but only 36 to unmount.

If we change the synchronize_rcu() to synchronize_rcu_expedited()
the umount time on a 4-cpu VM drop to 0.6 seconds

I think this 200-fold speed up is worth the slightly high system
impact of using synchronize_rcu_expedited().

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
Signed-off-by: NeilBrown <neilb@suse.com>
---

I posted this last October, then again last November (cc:ing Linus)
Paul is happy enough with it, but no other response.
I'm hoping it can get applied this time....

Thanks,
NeilBrown


 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Al Viro Oct. 5, 2018, 1:40 a.m. UTC | #1
On Fri, Oct 05, 2018 at 11:27:37AM +1000, NeilBrown wrote:
> 
> The synchronize_rcu() in namespace_unlock() is called every time
> a filesystem is unmounted.  If a great many filesystems are mounted,
> this can cause a noticable slow-down in, for example, system shutdown.
> 
> The sequence:
>   mkdir -p /tmp/Mtest/{0..5000}
>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>   time umount /tmp/Mtest/*
> 
> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> 100 seconds to unmount them.
> 
> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> tmpfs filesystems, but only 36 to unmount.
> 
> If we change the synchronize_rcu() to synchronize_rcu_expedited()
> the umount time on a 4-cpu VM drop to 0.6 seconds
> 
> I think this 200-fold speed up is worth the slightly high system
> impact of using synchronize_rcu_expedited().
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> I posted this last October, then again last November (cc:ing Linus)
> Paul is happy enough with it, but no other response.
> I'm hoping it can get applied this time....

Umm...  IIRC, the last one got sidetracked on the other thing in the series...
<checks> that was s_anon stuff.  I can live with this one; FWIW, what kind
of load would trigger the impact of the change?  Paul?
NeilBrown Oct. 5, 2018, 2:53 a.m. UTC | #2
On Fri, Oct 05 2018, Al Viro wrote:

> On Fri, Oct 05, 2018 at 11:27:37AM +1000, NeilBrown wrote:
>> 
>> The synchronize_rcu() in namespace_unlock() is called every time
>> a filesystem is unmounted.  If a great many filesystems are mounted,
>> this can cause a noticable slow-down in, for example, system shutdown.
>> 
>> The sequence:
>>   mkdir -p /tmp/Mtest/{0..5000}
>>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>>   time umount /tmp/Mtest/*
>> 
>> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
>> 100 seconds to unmount them.
>> 
>> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
>> tmpfs filesystems, but only 36 to unmount.
>> 
>> If we change the synchronize_rcu() to synchronize_rcu_expedited()
>> the umount time on a 4-cpu VM drop to 0.6 seconds
>> 
>> I think this 200-fold speed up is worth the slightly high system
>> impact of using synchronize_rcu_expedited().
>> 
>> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>> 
>> I posted this last October, then again last November (cc:ing Linus)
>> Paul is happy enough with it, but no other response.
>> I'm hoping it can get applied this time....
>
> Umm...  IIRC, the last one got sidetracked on the other thing in the series...
> <checks> that was s_anon stuff.  I can live with this one; FWIW, what kind
> of load would trigger the impact of the change?  Paul?

I think you would need a long sequence of umounts to notice anything.
What you would notice is substantially reduced wall-clock time, but
slightly increased CPU time.

The original bug report that lead to this patch was a system with "HUGE
direct automount maps (>23k at this point)".
Stopping autofs (during shutdown) took more minutes than seemed
reasonable.

I noticed it again just recently when working on a systemd issue.  If
you mount thousands of filesystems in quick succession (ClearCase can do
this), systemd processes /proc/self/mountinfo constantly and slows down
the whole process.  When I unmount my test filesystems (mount --bind
/etc /MNT/$1) it takes a similar amount of time, but now it isn't
systemd slowing things down (which is odd actually, I wonder why systemd
didn't notice..) but rather the synchronize_rcu() delays.

Thanks,
NeilBrown
Paul E. McKenney Oct. 5, 2018, 4:08 a.m. UTC | #3
On Fri, Oct 05, 2018 at 02:40:02AM +0100, Al Viro wrote:
> On Fri, Oct 05, 2018 at 11:27:37AM +1000, NeilBrown wrote:
> > 
> > The synchronize_rcu() in namespace_unlock() is called every time
> > a filesystem is unmounted.  If a great many filesystems are mounted,
> > this can cause a noticable slow-down in, for example, system shutdown.
> > 
> > The sequence:
> >   mkdir -p /tmp/Mtest/{0..5000}
> >   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
> >   time umount /tmp/Mtest/*
> > 
> > on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> > 100 seconds to unmount them.
> > 
> > Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> > tmpfs filesystems, but only 36 to unmount.
> > 
> > If we change the synchronize_rcu() to synchronize_rcu_expedited()
> > the umount time on a 4-cpu VM drop to 0.6 seconds
> > 
> > I think this 200-fold speed up is worth the slightly high system
> > impact of using synchronize_rcu_expedited().
> > 
> > Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
> > Signed-off-by: NeilBrown <neilb@suse.com>
> > ---
> > 
> > I posted this last October, then again last November (cc:ing Linus)
> > Paul is happy enough with it, but no other response.
> > I'm hoping it can get applied this time....
> 
> Umm...  IIRC, the last one got sidetracked on the other thing in the series...
> <checks> that was s_anon stuff.  I can live with this one; FWIW, what kind
> of load would trigger the impact of the change?  Paul?

You lost me with "what kind of load would trigger the impact of the
change?", but if you are asking about the downside, that would be IPIs
sent from each call to synchronize_rcu_expedited().  But people with
things like real-time workloads that therefore don't like those IPIs
have a number of options:

1.	Boot with rcupdate.rcu_normal=1, which converts all calls to
	synchronize_rcu_expedited() to synchronize_rcu().  This of
	course loses the performance gain, but this can be a good
	tradeoff for real-time workloads.

2.	Build with CONFIG_NO_HZ_FULL=y and boot with nohz_full= to
	cover the CPUs running your real-time workload.  Then
	as long as there is only one runnable usermode task per
	nohz_full CPU, synchronize_rcu_expedited() will avoid sending
	IPIs to any of the nohz_full CPUs.

3.	Don't do unmounts while your real-time application is running.

Probably other options as well, but those are the ones that come
immediately to mind.

If I missed the point of your question, please help me understand
what you are asking for.

							Thanx, Paul
NeilBrown Nov. 6, 2018, 3:15 a.m. UTC | #4
On Fri, Oct 05 2018, NeilBrown wrote:

> The synchronize_rcu() in namespace_unlock() is called every time
> a filesystem is unmounted.  If a great many filesystems are mounted,
> this can cause a noticable slow-down in, for example, system shutdown.
>
> The sequence:
>   mkdir -p /tmp/Mtest/{0..5000}
>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>   time umount /tmp/Mtest/*
>
> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> 100 seconds to unmount them.
>
> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> tmpfs filesystems, but only 36 to unmount.
>
> If we change the synchronize_rcu() to synchronize_rcu_expedited()
> the umount time on a 4-cpu VM drop to 0.6 seconds
>
> I think this 200-fold speed up is worth the slightly high system
> impact of using synchronize_rcu_expedited().
>
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>
> I posted this last October, then again last November (cc:ing Linus)
> Paul is happy enough with it, but no other response.
> I'm hoping it can get applied this time....

Hi Al,
 this isn't in 4.20-rc1.  Are you still waiting for something?

Thanks,
NeilBrown


>
> Thanks,
> NeilBrown
>
>
>  fs/namespace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 99186556f8d3..02e978b22294 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1360,7 +1360,7 @@ static void namespace_unlock(void)
>  	if (likely(hlist_empty(&head)))
>  		return;
>  
> -	synchronize_rcu();
> +	synchronize_rcu_expedited();
>  
>  	group_pin_kill(&head);
>  }
> -- 
> 2.14.0.rc0.dirty
NeilBrown Nov. 29, 2018, 11:33 p.m. UTC | #5
The synchronize_rcu() in namespace_unlock() is called every time
a filesystem is unmounted.  If a great many filesystems are mounted,
this can cause a noticable slow-down in, for example, system shutdown.

The sequence:
  mkdir -p /tmp/Mtest/{0..5000}
  time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
  time umount /tmp/Mtest/*

on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
100 seconds to unmount them.

Boot the same VM with 1 CPU and it takes 18 seconds to mount the
tmpfs filesystems, but only 36 to unmount.

If we change the synchronize_rcu() to synchronize_rcu_expedited()
the umount time on a 4-cpu VM drop to 0.6 seconds

I think this 200-fold speed up is worth the slightly high system
impact of using synchronize_rcu_expedited().

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
Signed-off-by: NeilBrown <neilb@suse.com>
---

Al Viro says "I can live with this one" but this still hasn't landed.
Maybe someone else could apply it?

Thanks,
NeilBrown

Full quote from Al on 5th Oct:
> Umm...  IIRC, the last one got sidetracked on the other thing in the series...
> <checks> that was s_anon stuff.  I can live with this one; FWIW, what kind
> of load would trigger the impact of the change?  Paul?
which Paul replied to.

 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a7f91265ea67..43a0d2c7449d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1360,7 +1360,7 @@ static void namespace_unlock(void)
 	if (likely(hlist_empty(&head)))
 		return;
 
-	synchronize_rcu();
+	synchronize_rcu_expedited();
 
 	group_pin_kill(&head);
 }
Al Viro Nov. 29, 2018, 11:52 p.m. UTC | #6
On Fri, Nov 30, 2018 at 10:33:18AM +1100, NeilBrown wrote:
> 
> The synchronize_rcu() in namespace_unlock() is called every time
> a filesystem is unmounted.  If a great many filesystems are mounted,
> this can cause a noticable slow-down in, for example, system shutdown.
> 
> The sequence:
>   mkdir -p /tmp/Mtest/{0..5000}
>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>   time umount /tmp/Mtest/*
> 
> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
> 100 seconds to unmount them.
> 
> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
> tmpfs filesystems, but only 36 to unmount.
> 
> If we change the synchronize_rcu() to synchronize_rcu_expedited()
> the umount time on a 4-cpu VM drop to 0.6 seconds
> 
> I think this 200-fold speed up is worth the slightly high system
> impact of using synchronize_rcu_expedited().
> 
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> Al Viro says "I can live with this one" but this still hasn't landed.
> Maybe someone else could apply it?

Applied (in work.misc, once I push it out)
NeilBrown Nov. 30, 2018, 1:09 a.m. UTC | #7
On Thu, Nov 29 2018, Al Viro wrote:

> On Fri, Nov 30, 2018 at 10:33:18AM +1100, NeilBrown wrote:
>> 
>> The synchronize_rcu() in namespace_unlock() is called every time
>> a filesystem is unmounted.  If a great many filesystems are mounted,
>> this can cause a noticable slow-down in, for example, system shutdown.
>> 
>> The sequence:
>>   mkdir -p /tmp/Mtest/{0..5000}
>>   time for i in /tmp/Mtest/*; do mount -t tmpfs tmpfs $i ; done
>>   time umount /tmp/Mtest/*
>> 
>> on a 4-cpu VM can report 8 seconds to mount the tmpfs filesystems, and
>> 100 seconds to unmount them.
>> 
>> Boot the same VM with 1 CPU and it takes 18 seconds to mount the
>> tmpfs filesystems, but only 36 to unmount.
>> 
>> If we change the synchronize_rcu() to synchronize_rcu_expedited()
>> the umount time on a 4-cpu VM drop to 0.6 seconds
>> 
>> I think this 200-fold speed up is worth the slightly high system
>> impact of using synchronize_rcu_expedited().
>> 
>> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (from general rcu perspective)
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>> 
>> Al Viro says "I can live with this one" but this still hasn't landed.
>> Maybe someone else could apply it?
>
> Applied (in work.misc, once I push it out)

Excellent - thanks a lot :-)

NeilBrown

Patch
diff mbox series

diff --git a/fs/namespace.c b/fs/namespace.c
index 99186556f8d3..02e978b22294 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1360,7 +1360,7 @@  static void namespace_unlock(void)
 	if (likely(hlist_empty(&head)))
 		return;
 
-	synchronize_rcu();
+	synchronize_rcu_expedited();
 
 	group_pin_kill(&head);
 }