diff mbox

[PATCHv2] KVM: optimize apic interrupt delivery

Message ID 20131205230033.GB15492@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paul E. McKenney Dec. 5, 2013, 11 p.m. UTC
On Thu, Nov 28, 2013 at 10:55:06AM +0200, Gleb Natapov wrote:
> On Wed, Nov 27, 2013 at 09:06:36AM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 27, 2013 at 10:00:09AM +0200, Gleb Natapov wrote:
> > > On Tue, Nov 26, 2013 at 11:35:06AM -0800, Paul E. McKenney wrote:
> > > > On Tue, Nov 26, 2013 at 06:24:13PM +0200, Michael S. Tsirkin wrote:
> > > > > On Wed, Sep 12, 2012 at 08:13:54AM -0700, Paul E. McKenney wrote:
> > > > > > On Wed, Sep 12, 2012 at 03:44:26PM +0300, Gleb Natapov wrote:
> > > > > > > On Wed, Sep 12, 2012 at 03:36:57PM +0300, Avi Kivity wrote:
> > > > > > > > On 09/12/2012 03:34 PM, Gleb Natapov wrote:
> > > > > > > > > On Wed, Sep 12, 2012 at 10:45:22AM +0300, Avi Kivity wrote:
> > > > > > > > >> On 09/12/2012 04:03 AM, Paul E. McKenney wrote:
> > > > > > > > >> >> > > Paul, I'd like to check something with you here:
> > > > > > > > >> >> > > this function can be triggered by userspace,
> > > > > > > > >> >> > > any number of times; we allocate
> > > > > > > > >> >> > > a 2K chunk of memory that is later freed by
> > > > > > > > >> >> > > kfree_rcu.
> > > > > > > > >> >> > > 
> > > > > > > > >> >> > > Is there a risk of DOS if RCU is delayed while
> > > > > > > > >> >> > > lots of memory is queued up in this way?
> > > > > > > > >> >> > > If yes is this a generic problem with kfree_rcu
> > > > > > > > >> >> > > that should be addressed in core kernel?
> > > > > > > > >> >> > 
> > > > > > > > >> >> > There is indeed a risk.
> > > > > > > > >> >> 
> > > > > > > > >> >> In our case it's a 2K object. Is it a practical risk?
> > > > > > > > >> > 
> > > > > > > > >> > How many kfree_rcu()s per second can a given user cause to happen?
> > > > > > > > >> 
> > > > > > > > >> Not much more than a few hundred thousand per second per process (normal
> > > > > > > > >> operation is zero).
> > > > > > > > >> 
> > > > > > > > > I managed to do 21466 per second.
> > > > > > > > 
> > > > > > > > Strange, why so slow?
> > > > > > > > 
> > > > > > > Because ftrace buffer overflows :) With bigger buffer I get 169940.
> > > > > > 
> > > > > > Ah, good, should not be a problem.  In contrast, if you ran kfree_rcu() in
> > > > > > a tight loop, you could probably do in excess of 100M per CPU per second.
> > > > > > Now -that- might be a problem.
> > > > > > 
> > > > > > Well, it -might- be a problem if you somehow figured out how to allocate
> > > > > > memory that quickly in a steady-state manner.  ;-)
> > > > > > 
> > > > > > > > >> Good idea.  Michael, is should be easy to modify kvm-unit-tests to write
> > > > > > > > >> to the APIC ID register in a loop.
> > > > > > > > >> 
> > > > > > > > > I did. Memory consumption does not grow on otherwise idle host.
> > > > > > 
> > > > > > Very good -- the checks in __call_rcu(), which is common code invoked by
> > > > > > kfree_rcu(), seem to be doing their job, then.  These do keep a per-CPU
> > > > > > counter, which can be adjusted via rcutree.blimit, which defaults
> > > > > > to taking evasive action if more than 10K callbacks are waiting on a
> > > > > > given CPU.
> > > > > > 
> > > > > > My concern was that you might be overrunning that limit in way less
> > > > > > than a grace period (as in about a hundred microseconds.  My concern
> > > > > > was of course unfounded -- you take several grace periods in push 10K
> > > > > > callbacks through.
> > > > > > 
> > > > > > 							Thanx, Paul
> > > > > 
> > > > > Gleb noted that Documentation/RCU/checklist.txt has this text:
> > > > > 
> > > > >         An especially important property of the synchronize_rcu()
> > > > >         primitive is that it automatically self-limits: if grace periods
> > > > >         are delayed for whatever reason, then the synchronize_rcu()
> > > > >         primitive will correspondingly delay updates.  In contrast,
> > > > >         code using call_rcu() should explicitly limit update rate in
> > > > >         cases where grace periods are delayed, as failing to do so can
> > > > >         result in excessive realtime latencies or even OOM conditions.
> > > > > 
> > > > > If call_rcu is self-limiting maybe this should be documented ...
> > > > 
> > > > It would be more accurate to say that takes has some measures to limit
> > > > the damage -- you can overwhelm these measures if you try hard enough.
> > > > 
> > > The question is: Is it safe to have a call_rcu() without any additional rate limiting
> > > on user triggerable code path?
> > 
> > That would be a good way to allow users to run your system out of memory,
> > especially on systems with limited memory.  (If you have several GB of
> > free space, you might be OK.)
> > 
> Thanks! Got it.

Does the following help?

							Thanx, Paul

------------------------------------------------------------------------

rcu: Document call_rcu() safety mechanisms and limitations

The call_rcu() family of primitives will take action to accelerate
grace periods when the number of callbacks pending on a given CPU
becomes excessive.  Although this safety mechanism can be useful,
it is no substitute for users of call_rcu() having rate-limit controls
in place.  This commit adds this nuance to the documentation.

Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Reported-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Gleb Natapov Dec. 6, 2013, 11:32 a.m. UTC | #1
On Thu, Dec 05, 2013 at 03:00:33PM -0800, Paul E. McKenney wrote:
> > > > The question is: Is it safe to have a call_rcu() without any additional rate limiting
> > > > on user triggerable code path?
> > > 
> > > That would be a good way to allow users to run your system out of memory,
> > > especially on systems with limited memory.  (If you have several GB of
> > > free space, you might be OK.)
> > > 
> > Thanks! Got it.
> 
> Does the following help?
> 
Looks good to me.

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> rcu: Document call_rcu() safety mechanisms and limitations
> 
> The call_rcu() family of primitives will take action to accelerate
> grace periods when the number of callbacks pending on a given CPU
> becomes excessive.  Although this safety mechanism can be useful,
> it is no substitute for users of call_rcu() having rate-limit controls
> in place.  This commit adds this nuance to the documentation.
> 
> Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
> Reported-by: Gleb Natapov <gleb@redhat.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
> index 91266193b8f4..5733e31836b5 100644
> --- a/Documentation/RCU/checklist.txt
> +++ b/Documentation/RCU/checklist.txt
> @@ -256,10 +256,11 @@ over a rather long period of time, but improvements are always welcome!
>  		variations on this theme.
>  
>  	b.	Limiting update rate.  For example, if updates occur only
> -		once per hour, then no explicit rate limiting is required,
> -		unless your system is already badly broken.  The dcache
> -		subsystem takes this approach -- updates are guarded
> -		by a global lock, limiting their rate.
> +		once per hour, then no explicit rate limiting is
> +		required, unless your system is already badly broken.
> +		Older versions of the dcache subsystem takes this
> +		approach -- updates were guarded by a global lock,
> +		limiting their rate.
>  
>  	c.	Trusted update -- if updates can only be done manually by
>  		superuser or some other trusted user, then it might not
> @@ -268,7 +269,8 @@ over a rather long period of time, but improvements are always welcome!
>  		the machine.
>  
>  	d.	Use call_rcu_bh() rather than call_rcu(), in order to take
> -		advantage of call_rcu_bh()'s faster grace periods.
> +		advantage of call_rcu_bh()'s faster grace periods.  (This
> +		is only a partial solution, though.)
>  
>  	e.	Periodically invoke synchronize_rcu(), permitting a limited
>  		number of updates per grace period.
> @@ -276,6 +278,13 @@ over a rather long period of time, but improvements are always welcome!
>  	The same cautions apply to call_rcu_bh(), call_rcu_sched(),
>  	call_srcu(), and kfree_rcu().
>  
> +	Note that although these primitives do take action to avoid memory
> +	exhaustion when any given CPU has too many callbacks, a determined
> +	user could still exhaust memory.  This is especially the case
> +	if a system with a large number of CPUs has been configured to
> +	offload all of its RCU callbacks onto a single CPU, or if the
> +	system has relatively little free memory.
> +
>  9.	All RCU list-traversal primitives, which include
>  	rcu_dereference(), list_for_each_entry_rcu(), and
>  	list_for_each_safe_rcu(), must be either within an RCU read-side
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 91266193b8f4..5733e31836b5 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -256,10 +256,11 @@  over a rather long period of time, but improvements are always welcome!
 		variations on this theme.
 
 	b.	Limiting update rate.  For example, if updates occur only
-		once per hour, then no explicit rate limiting is required,
-		unless your system is already badly broken.  The dcache
-		subsystem takes this approach -- updates are guarded
-		by a global lock, limiting their rate.
+		once per hour, then no explicit rate limiting is
+		required, unless your system is already badly broken.
+		Older versions of the dcache subsystem takes this
+		approach -- updates were guarded by a global lock,
+		limiting their rate.
 
 	c.	Trusted update -- if updates can only be done manually by
 		superuser or some other trusted user, then it might not
@@ -268,7 +269,8 @@  over a rather long period of time, but improvements are always welcome!
 		the machine.
 
 	d.	Use call_rcu_bh() rather than call_rcu(), in order to take
-		advantage of call_rcu_bh()'s faster grace periods.
+		advantage of call_rcu_bh()'s faster grace periods.  (This
+		is only a partial solution, though.)
 
 	e.	Periodically invoke synchronize_rcu(), permitting a limited
 		number of updates per grace period.
@@ -276,6 +278,13 @@  over a rather long period of time, but improvements are always welcome!
 	The same cautions apply to call_rcu_bh(), call_rcu_sched(),
 	call_srcu(), and kfree_rcu().
 
+	Note that although these primitives do take action to avoid memory
+	exhaustion when any given CPU has too many callbacks, a determined
+	user could still exhaust memory.  This is especially the case
+	if a system with a large number of CPUs has been configured to
+	offload all of its RCU callbacks onto a single CPU, or if the
+	system has relatively little free memory.
+
 9.	All RCU list-traversal primitives, which include
 	rcu_dereference(), list_for_each_entry_rcu(), and
 	list_for_each_safe_rcu(), must be either within an RCU read-side