From patchwork Thu Dec 5 23:00:33 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 3291491 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 57D5AC0D4A for ; Thu, 5 Dec 2013 23:00:45 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id EFB4220501 for ; Thu, 5 Dec 2013 23:00:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 879A620462 for ; Thu, 5 Dec 2013 23:00:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752150Ab3LEXAj (ORCPT ); Thu, 5 Dec 2013 18:00:39 -0500 Received: from e39.co.us.ibm.com ([32.97.110.160]:50121 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750920Ab3LEXAi (ORCPT ); Thu, 5 Dec 2013 18:00:38 -0500 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 5 Dec 2013 16:00:38 -0700 Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e39.co.us.ibm.com (192.168.1.139) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 5 Dec 2013 16:00:35 -0700 Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id ED1F41FF001E for ; Thu, 5 Dec 2013 16:00:12 -0700 (MST) Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by b03cxnp08025.gho.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id rB5KwglI8061326 for ; Thu, 5 Dec 2013 21:58:42 +0100 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id rB5N3Y6u000555 for ; Thu, 5 Dec 2013 16:03:34 -0700 Received: from paulmck-ThinkPad-W500 ([9.70.82.185]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id rB5N3Ynd000539; Thu, 5 Dec 2013 16:03:34 -0700 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id 73CE6385B05; Thu, 5 Dec 2013 15:00:33 -0800 (PST) Date: Thu, 5 Dec 2013 15:00:33 -0800 From: "Paul E. McKenney" To: Gleb Natapov Cc: "Michael S. Tsirkin" , kvm@vger.kernel.org, mtosatti@redhat.com Subject: Re: [PATCHv2] KVM: optimize apic interrupt delivery Message-ID: <20131205230033.GB15492@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <50503D92.7090108@redhat.com> <20120912123441.GQ20907@redhat.com> <505081E9.8080505@redhat.com> <20120912124426.GR20907@redhat.com> <20120912151354.GO4257@linux.vnet.ibm.com> <20131126162402.GA24806@redhat.com> <20131126193506.GE4137@linux.vnet.ibm.com> <20131127080009.GO959@redhat.com> <20131127170635.GM4137@linux.vnet.ibm.com> <20131128085506.GB959@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20131128085506.GB959@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13120523-9332-0000-0000-000002652D0F Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Nov 28, 2013 at 10:55:06AM +0200, Gleb Natapov wrote: > On Wed, Nov 27, 2013 at 09:06:36AM -0800, Paul E. McKenney wrote: > > On Wed, Nov 27, 2013 at 10:00:09AM +0200, Gleb Natapov wrote: > > > On Tue, Nov 26, 2013 at 11:35:06AM -0800, Paul E. McKenney wrote: > > > > On Tue, Nov 26, 2013 at 06:24:13PM +0200, Michael S. Tsirkin wrote: > > > > > On Wed, Sep 12, 2012 at 08:13:54AM -0700, Paul E. McKenney wrote: > > > > > > On Wed, Sep 12, 2012 at 03:44:26PM +0300, Gleb Natapov wrote: > > > > > > > On Wed, Sep 12, 2012 at 03:36:57PM +0300, Avi Kivity wrote: > > > > > > > > On 09/12/2012 03:34 PM, Gleb Natapov wrote: > > > > > > > > > On Wed, Sep 12, 2012 at 10:45:22AM +0300, Avi Kivity wrote: > > > > > > > > >> On 09/12/2012 04:03 AM, Paul E. McKenney wrote: > > > > > > > > >> >> > > Paul, I'd like to check something with you here: > > > > > > > > >> >> > > this function can be triggered by userspace, > > > > > > > > >> >> > > any number of times; we allocate > > > > > > > > >> >> > > a 2K chunk of memory that is later freed by > > > > > > > > >> >> > > kfree_rcu. > > > > > > > > >> >> > > > > > > > > > > >> >> > > Is there a risk of DOS if RCU is delayed while > > > > > > > > >> >> > > lots of memory is queued up in this way? > > > > > > > > >> >> > > If yes is this a generic problem with kfree_rcu > > > > > > > > >> >> > > that should be addressed in core kernel? > > > > > > > > >> >> > > > > > > > > > >> >> > There is indeed a risk. > > > > > > > > >> >> > > > > > > > > >> >> In our case it's a 2K object. Is it a practical risk? > > > > > > > > >> > > > > > > > > > >> > How many kfree_rcu()s per second can a given user cause to happen? > > > > > > > > >> > > > > > > > > >> Not much more than a few hundred thousand per second per process (normal > > > > > > > > >> operation is zero). > > > > > > > > >> > > > > > > > > > I managed to do 21466 per second. > > > > > > > > > > > > > > > > Strange, why so slow? > > > > > > > > > > > > > > > Because ftrace buffer overflows :) With bigger buffer I get 169940. > > > > > > > > > > > > Ah, good, should not be a problem. In contrast, if you ran kfree_rcu() in > > > > > > a tight loop, you could probably do in excess of 100M per CPU per second. > > > > > > Now -that- might be a problem. > > > > > > > > > > > > Well, it -might- be a problem if you somehow figured out how to allocate > > > > > > memory that quickly in a steady-state manner. ;-) > > > > > > > > > > > > > > >> Good idea. Michael, is should be easy to modify kvm-unit-tests to write > > > > > > > > >> to the APIC ID register in a loop. > > > > > > > > >> > > > > > > > > > I did. Memory consumption does not grow on otherwise idle host. > > > > > > > > > > > > Very good -- the checks in __call_rcu(), which is common code invoked by > > > > > > kfree_rcu(), seem to be doing their job, then. These do keep a per-CPU > > > > > > counter, which can be adjusted via rcutree.blimit, which defaults > > > > > > to taking evasive action if more than 10K callbacks are waiting on a > > > > > > given CPU. > > > > > > > > > > > > My concern was that you might be overrunning that limit in way less > > > > > > than a grace period (as in about a hundred microseconds. My concern > > > > > > was of course unfounded -- you take several grace periods in push 10K > > > > > > callbacks through. > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > Gleb noted that Documentation/RCU/checklist.txt has this text: > > > > > > > > > > An especially important property of the synchronize_rcu() > > > > > primitive is that it automatically self-limits: if grace periods > > > > > are delayed for whatever reason, then the synchronize_rcu() > > > > > primitive will correspondingly delay updates. In contrast, > > > > > code using call_rcu() should explicitly limit update rate in > > > > > cases where grace periods are delayed, as failing to do so can > > > > > result in excessive realtime latencies or even OOM conditions. > > > > > > > > > > If call_rcu is self-limiting maybe this should be documented ... > > > > > > > > It would be more accurate to say that takes has some measures to limit > > > > the damage -- you can overwhelm these measures if you try hard enough. > > > > > > > The question is: Is it safe to have a call_rcu() without any additional rate limiting > > > on user triggerable code path? > > > > That would be a good way to allow users to run your system out of memory, > > especially on systems with limited memory. (If you have several GB of > > free space, you might be OK.) > > > Thanks! Got it. Does the following help? Thanx, Paul ------------------------------------------------------------------------ rcu: Document call_rcu() safety mechanisms and limitations The call_rcu() family of primitives will take action to accelerate grace periods when the number of callbacks pending on a given CPU becomes excessive. Although this safety mechanism can be useful, it is no substitute for users of call_rcu() having rate-limit controls in place. This commit adds this nuance to the documentation. Reported-by: "Michael S. Tsirkin" Reported-by: Gleb Natapov Signed-off-by: Paul E. McKenney --- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 91266193b8f4..5733e31836b5 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -256,10 +256,11 @@ over a rather long period of time, but improvements are always welcome! variations on this theme. b. Limiting update rate. For example, if updates occur only - once per hour, then no explicit rate limiting is required, - unless your system is already badly broken. The dcache - subsystem takes this approach -- updates are guarded - by a global lock, limiting their rate. + once per hour, then no explicit rate limiting is + required, unless your system is already badly broken. + Older versions of the dcache subsystem takes this + approach -- updates were guarded by a global lock, + limiting their rate. c. Trusted update -- if updates can only be done manually by superuser or some other trusted user, then it might not @@ -268,7 +269,8 @@ over a rather long period of time, but improvements are always welcome! the machine. d. Use call_rcu_bh() rather than call_rcu(), in order to take - advantage of call_rcu_bh()'s faster grace periods. + advantage of call_rcu_bh()'s faster grace periods. (This + is only a partial solution, though.) e. Periodically invoke synchronize_rcu(), permitting a limited number of updates per grace period. @@ -276,6 +278,13 @@ over a rather long period of time, but improvements are always welcome! The same cautions apply to call_rcu_bh(), call_rcu_sched(), call_srcu(), and kfree_rcu(). + Note that although these primitives do take action to avoid memory + exhaustion when any given CPU has too many callbacks, a determined + user could still exhaust memory. This is especially the case + if a system with a large number of CPUs has been configured to + offload all of its RCU callbacks onto a single CPU, or if the + system has relatively little free memory. + 9. All RCU list-traversal primitives, which include rcu_dereference(), list_for_each_entry_rcu(), and list_for_each_safe_rcu(), must be either within an RCU read-side