From patchwork Fri Sep 18 19:48:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uladzislau Rezki X-Patchwork-Id: 11785931 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BB691746 for ; Fri, 18 Sep 2020 19:48:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 511742311B for ; Fri, 18 Sep 2020 19:48:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OsLGoR1E" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 511742311B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BA57B8E0001; Fri, 18 Sep 2020 15:48:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B6A06900004; Fri, 18 Sep 2020 15:48:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9AB8E8E0008; Fri, 18 Sep 2020 15:48:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id 7EFC38E0001 for ; Fri, 18 Sep 2020 15:48:40 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 47DEE362C for ; Fri, 18 Sep 2020 19:48:40 +0000 (UTC) X-FDA: 77277219600.05.bells26_490322d2712d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 2D6861801BD7F for ; Fri, 18 Sep 2020 19:48:40 +0000 (UTC) X-Spam-Summary: 1,0,0,294fb338bb188ed5,d41d8cd98f00b204,urezki@gmail.com,,RULES_HIT:1:2:41:69:355:379:541:800:960:966:973:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2195:2196:2199:2200:2393:2559:2562:2693:2732:2736:2901:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4050:4250:4321:4385:4605:5007:6117:6119:6261:6653:7514:7903:8603:8660:9040:9413:9592:9707:10004:11026:11473:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12683:12895:12986:13141:13148:13161:13229:13230:13894:13972:14394:14687:21063:21080:21324:21433:21444:21451:21627:21666:21740:21790:21939:21966:21987:21990:30012:30054,0,RBL:209.85.208.193:@gmail.com:.lbl8.mailshell.net-62.18.84.100 66.100.201.100;04ygjgxbmijagyq3x7tjcbd49j6xsycx9smrc194swgau94wdm6asetoeins5ef.me84zemipbn6z5xz1ec9xk5m6t4o89cexzb7i796mot49sxotbzqc1nirer34ha.y-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSB L:neutra X-HE-Tag: bells26_490322d2712d X-Filterd-Recvd-Size: 10844 Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Sep 2020 19:48:39 +0000 (UTC) Received: by mail-lj1-f193.google.com with SMTP id r24so6039486ljm.3 for ; Fri, 18 Sep 2020 12:48:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LhLXWlpxd3kg9Dw7Z+xdLIdqyBDd0WYhln7MlbVfq8o=; b=OsLGoR1E8NC1PkjPfwdexAD/f6MYLMg5cdwRLXrWpg0UBZphtr1LtxqUiw66rpXU5H j6LVxbBrzDWbNhVASenQKtUnM4NMvywXwu1Yk/sMflp58j91WXbkkq7bquHulmheLZCg ax1KY2tb7+ydczYpRXm+uusDWZmMlOzK0WIMeR8pmB9N8KA614+R1c1xHhPZ5Xy4mShu F/6gtmo1xPtux3JE5e6e8V4p/UDvPjCc/7IqTsPaHxTF/A7iS+Jv6AWPut9tLxGGViR0 2CpyijXyKmRnk27mzzeqD+ZvgTvj1sbA/lPCdPz4/UE8SsvKsxtraV0Ul+GqnJYzm3dl MZXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LhLXWlpxd3kg9Dw7Z+xdLIdqyBDd0WYhln7MlbVfq8o=; b=hIaaCX57c4RXDZOspYmJk0MCGF6ZIzeCwMKDWy/r31Ps3ybZ05/0vf3nhxejaSsf3t Fu802URnIa3YnUhQLPT1IYJcBp5/w9DvWX/XdlMvVEYOJ1J4adZu4MsPPAcUSYWdBSlC AC9A8Tn/Msrl4lIzxN4MpxPMVIz5D0RoF0zAtK9wicig7D+3frT1qCXVwORr7CXpVZwV 7s4JqrNwjG43rxHznpaEkb05Mj9yvOH8w0e4qTaVB1xTALuxVtUpLtYtFUH+zzA/1/nL k8pDmeHsBZMSDKQ5kLhnyWJucY1GgjuyXlocm/0vuvz1oWMPL2CNOZRR7MH0YVKkWZY5 D2WA== X-Gm-Message-State: AOAM530iH0XaVUwgD6Ff41Bgap8lmBgd1f86prb9FDbu0CIbnthc2Ris bDEWlmpfGzatavly23d80vM= X-Google-Smtp-Source: ABdhPJxbsbV1YD3UHexm/CMuadg/sM3dvj4yy9obt+hyvmnL1YMSmdfhTYP8qmlhHuu4Bz9bGVGH/Q== X-Received: by 2002:a2e:81d7:: with SMTP id s23mr13083916ljg.69.1600458518216; Fri, 18 Sep 2020 12:48:38 -0700 (PDT) Received: from pc638.lan (h5ef52e31.seluork.dyn.perspektivbredband.net. [94.245.46.49]) by smtp.gmail.com with ESMTPSA id a17sm766769lfd.148.2020.09.18.12.48.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Sep 2020 12:48:37 -0700 (PDT) From: "Uladzislau Rezki (Sony)" To: LKML , RCU , linux-mm@kvack.org, Andrew Morton , "Paul E . McKenney" Cc: Peter Zijlstra , Michal Hocko , Vlastimil Babka , Thomas Gleixner , "Theodore Y . Ts'o" , Joel Fernandes , Sebastian Andrzej Siewior , Uladzislau Rezki , Oleksiy Avramchenko Subject: [PATCH 1/4] rcu/tree: Add a work to allocate pages from regular context Date: Fri, 18 Sep 2020 21:48:14 +0200 Message-Id: <20200918194817.48921-2-urezki@gmail.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200918194817.48921-1-urezki@gmail.com> References: <20200918194817.48921-1-urezki@gmail.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The current memmory-allocation interface presents to following difficulties that this patch is designed to overcome: a) If built with CONFIG_PROVE_RAW_LOCK_NESTING, the lockdep will complain about violation("BUG: Invalid wait context") of the nesting rules. It does the raw_spinlock vs. spinlock nesting checks, i.e. it is not legal to acquire a spinlock_t while holding a raw_spinlock_t. Internally the kfree_rcu() uses raw_spinlock_t whereas the "page allocator" internally deals with spinlock_t to access to its zones. The code also can be broken from higher level of view: raw_spin_lock(&some_lock); kfree_rcu(some_pointer, some_field_offset); b) If built with CONFIG_PREEMPT_RT. Please note, in that case spinlock_t is converted into sleepable variant. Invoking the page allocator from atomic contexts leads to "BUG: scheduling while atomic". c) call_rcu() is invoked from raw atomic context and kfree_rcu() and kvfree_rcu() are expected to be called from atomic raw context as well. Move out a page allocation from contexts which trigger kvfree_rcu() function to the separate worker. When a k[v]free_rcu() per-cpu page cache is empty a fallback mechanism is used and a special job is scheduled to refill the per-cpu cache. As a side effect, maintaining of the bulk arrays in the separate worker thread and not by request, will introduce other drawbacks. a) There is an extra latency window, a time during which a fallback mechanism is used until pages are obtained via the special worker for further pointers collecting over arrays. b) It is impossible to predict how many pages will be required to cover a demand that is controlled by different workloads on various systems. c) Memory overhead since we do not know how much pages should be preloaded. Above three concerns should be fixed by introducing a lock-free page allocation interface. Signed-off-by: Uladzislau Rezki (Sony) --- kernel/rcu/tree.c | 91 +++++++++++++++++++++++++---------------------- 1 file changed, 48 insertions(+), 43 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 548404489c04..4bfc46a1e9d1 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -177,7 +177,7 @@ module_param(rcu_unlock_delay, int, 0444); * per-CPU. Object size is equal to one page. This value * can be changed at boot time. */ -static int rcu_min_cached_objs = 2; +static int rcu_min_cached_objs = 5; module_param(rcu_min_cached_objs, int, 0444); /* Retrieve RCU kthreads priority for rcutorture */ @@ -3100,6 +3100,8 @@ struct kfree_rcu_cpu { * lockless an access has to be protected by the * per-cpu lock. */ + struct work_struct page_cache_work; + atomic_t work_in_progress; struct llist_head bkvcache; int nr_bkv_objs; }; @@ -3217,10 +3219,10 @@ static void kfree_rcu_work(struct work_struct *work) } rcu_lock_release(&rcu_callback_map); - krcp = krc_this_cpu_lock(&flags); + raw_spin_lock_irqsave(&krcp->lock, flags); if (put_cached_bnode(krcp, bkvhead[i])) bkvhead[i] = NULL; - krc_this_cpu_unlock(krcp, flags); + raw_spin_unlock_irqrestore(&krcp->lock, flags); if (bkvhead[i]) free_page((unsigned long) bkvhead[i]); @@ -3347,6 +3349,42 @@ static void kfree_rcu_monitor(struct work_struct *work) raw_spin_unlock_irqrestore(&krcp->lock, flags); } +static void fill_page_cache_func(struct work_struct *work) +{ + struct kvfree_rcu_bulk_data *bnode; + struct kfree_rcu_cpu *krcp = + container_of(work, struct kfree_rcu_cpu, + page_cache_work); + unsigned long flags; + bool pushed; + int i; + + for (i = 0; i < rcu_min_cached_objs; i++) { + /* + * We would like to minimize a reclaiming process, + * that is why GFP_NOWAIT is here. It can wakeup a + * kswapd, what is fine, because somebody soon or + * later will kick it to get the freelist back to + * the watermarks. + */ + bnode = (struct kvfree_rcu_bulk_data *) + __get_free_page(GFP_NOWAIT | __GFP_NOWARN); + + if (bnode) { + raw_spin_lock_irqsave(&krcp->lock, flags); + pushed = put_cached_bnode(krcp, bnode); + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + if (!pushed) { + free_page((unsigned long) bnode); + break; + } + } + } + + atomic_set(&krcp->work_in_progress, 0); +} + static inline bool kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) { @@ -3363,32 +3401,8 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) if (!krcp->bkvhead[idx] || krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) { bnode = get_cached_bnode(krcp); - if (!bnode) { - /* - * To keep this path working on raw non-preemptible - * sections, prevent the optional entry into the - * allocator as it uses sleeping locks. In fact, even - * if the caller of kfree_rcu() is preemptible, this - * path still is not, as krcp->lock is a raw spinlock. - * With additional page pre-allocation in the works, - * hitting this return is going to be much less likely. - */ - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - return false; - - /* - * NOTE: For one argument of kvfree_rcu() we can - * drop the lock and get the page in sleepable - * context. That would allow to maintain an array - * for the CONFIG_PREEMPT_RT as well if no cached - * pages are available. - */ - bnode = (struct kvfree_rcu_bulk_data *) - __get_free_page(GFP_NOWAIT | __GFP_NOWARN); - } - /* Switch to emergency path. */ - if (unlikely(!bnode)) + if (!bnode) return false; /* Initialize the new block. */ @@ -3422,6 +3436,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) { unsigned long flags; struct kfree_rcu_cpu *krcp; + bool irq_disabled = irqs_disabled(); bool success; void *ptr; @@ -3452,12 +3467,12 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) goto unlock_return; } - /* - * Under high memory pressure GFP_NOWAIT can fail, - * in that case the emergency path is maintained. - */ success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr); if (!success) { + // TODO: schedule the work from the hrtimer. + if (!irq_disabled && !atomic_xchg(&krcp->work_in_progress, 1)) + queue_work(system_highpri_wq, &krcp->page_cache_work); + if (head == NULL) // Inline if kvfree_rcu(one_arg) call. goto unlock_return; @@ -4449,24 +4464,14 @@ static void __init kfree_rcu_batch_init(void) for_each_possible_cpu(cpu) { struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); - struct kvfree_rcu_bulk_data *bnode; for (i = 0; i < KFREE_N_BATCHES; i++) { INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work); krcp->krw_arr[i].krcp = krcp; } - for (i = 0; i < rcu_min_cached_objs; i++) { - bnode = (struct kvfree_rcu_bulk_data *) - __get_free_page(GFP_NOWAIT | __GFP_NOWARN); - - if (bnode) - put_cached_bnode(krcp, bnode); - else - pr_err("Failed to preallocate for %d CPU!\n", cpu); - } - INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); + INIT_WORK(&krcp->page_cache_work, fill_page_cache_func); krcp->initialized = true; } if (register_shrinker(&kfree_rcu_shrinker))