From patchwork Thu Sep 28 18:09:50 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 9976641 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9DF616034B for ; Thu, 28 Sep 2017 18:10:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 923FF296E6 for ; Thu, 28 Sep 2017 18:10:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 859D3296E9; Thu, 28 Sep 2017 18:10:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A16B1296E6 for ; Thu, 28 Sep 2017 18:10:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750951AbdI1SKC (ORCPT ); Thu, 28 Sep 2017 14:10:02 -0400 Received: from mail-pf0-f180.google.com ([209.85.192.180]:50111 "EHLO mail-pf0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750819AbdI1SKA (ORCPT ); Thu, 28 Sep 2017 14:10:00 -0400 Received: by mail-pf0-f180.google.com with SMTP id l188so1238292pfc.6 for ; Thu, 28 Sep 2017 11:10:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=XgoaD08+aMLlD5ECg2F1crmrOZdAnZ4prKB17OogMtk=; b=RnOA5FGZspxXhatAW+h6h1QAzB/ykO49dQHlyuWew8oXfbKptWoDnuJXmoLMuewGdq sy/3Gd6u6qhnVU8Z+EWRsdYus/vD+oqAAipe59+ftm9HiaA/KC9ZTT/kGnwsuFbEKzcY RItraAUbBxRTTg5I6ZGZmnxDrDmjVs3R5C5aw7s1jNmPdnzRtXhGJyTfYdl7qmnNuk8j OQMgNoUjfHzGaD5Zds6EaCNIzMwLbZo0He5iJ2R08GTtBcc34dgWHlVyNRB37OcWynPB neDsIBraI/DGu1UvZLpe8XoOAvcf75ByYnuSABHNauS3qBed1q3e4X+MreFDdG7xq4lW bjdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=XgoaD08+aMLlD5ECg2F1crmrOZdAnZ4prKB17OogMtk=; b=eKD7qC3wSGlybtIwvDx0Hs4stSxCgieDsKS71aDIejkWV2bYuc7X+pOz7RKuHvuyBm vEVTjXsAOfqzw927XmEqoz0IFzKrqSpzDZ0G/a25SnluF/jQRhPuqJHXy8tw0rNBSQb3 PBpf0BdSu5yiB5Zp7GLKJz4w5jpiSKRBM2uUdL+3x6VkVLU8uyC7V5zDx1WNHUA1A1ir 1QlUQfBIgBddh+fHfM4o/0Eg7T8o/zCTPFN4FX8zs544A33Pj2HMqSdRNS1oeh46usFG Rkle1W1yTFgWdB6HDuPxV1vbn5ZJa1lcZT62ncoWbPpg4mekqnkRhT6WoabDCRsu55j4 fhwg== X-Gm-Message-State: AHPjjUhxBW+mLTQ8POcv6qDc/cBhjIXEKRV+OhUpzdEXWq5ie+60AKvh B6UjtKukzIe9vLE7xke31FvgVg== X-Google-Smtp-Source: AOwi7QBUcVH6XtSIvs/l8/J7rMm0igLtNAv5eNmL9zJ3zzN4xvdsoDiB2hSQCw/p2iqAQEt1+Q4Bqg== X-Received: by 10.99.3.212 with SMTP id 203mr4914895pgd.270.1506622200250; Thu, 28 Sep 2017 11:10:00 -0700 (PDT) Received: from [172.20.10.2] (mobile-166-170-36-30.mycingular.net. [166.170.36.30]) by smtp.gmail.com with ESMTPSA id c7sm4147228pfc.55.2017.09.28.11.09.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Sep 2017 11:09:58 -0700 (PDT) Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush To: Jan Kara Cc: Christoph Hellwig , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, clm@fb.com References: <1505921582-26709-1-git-send-email-axboe@kernel.dk> <1505921582-26709-8-git-send-email-axboe@kernel.dk> <20170921150510.GH8839@infradead.org> <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk> <3682c4c2-6e8a-e883-9f62-455ea2944496@kernel.dk> <20170925093532.GC5741@quack2.suse.cz> From: Jens Axboe Message-ID: <214d2bcb-0697-c051-0f36-20cd0d8702b0@kernel.dk> Date: Thu, 28 Sep 2017 20:09:50 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20170925093532.GC5741@quack2.suse.cz> Content-Language: en-US Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 09/25/2017 11:35 AM, Jan Kara wrote: > On Thu 21-09-17 10:00:25, Jens Axboe wrote: >> On 09/21/2017 09:36 AM, Jens Axboe wrote: >>>> But more importantly once we are not guaranteed that we only have >>>> a single global wb_writeback_work per bdi_writeback we should just >>>> embedd that into struct bdi_writeback instead of dynamically >>>> allocating it. >>> >>> We could do this as a followup. But right now the logic is that we >>> can have on started (inflight), and still have one new queued. >> >> Something like the below would fit on top to do that. Gets rid of the >> allocation and embeds the work item for global start-all in the >> bdi_writeback structure. > > Hum, so when we consider stuff like embedded work item, I would somewhat > prefer to handle this like we do for for_background and for_kupdate style > writeback so that we don't have another special case. For these don't queue > any item, we just queue writeback work into the workqueue (via > wb_wakeup()). When flusher work gets processed wb_do_writeback() checks > (after processing all normal writeback requests) whether conditions for > these special writeback styles are met and if yes, it creates on-stack work > item and processes it (see wb_check_old_data_flush() and > wb_check_background_flush()). > > So in this case we would just set some flag in bdi_writeback when memory > reclaim needs help and wb_do_writeback() would check for this flag and > create and process writeback-all style writeback work. Granted this does > not preserve ordering of requests (basically any specific request gets > priority over writeback-whole-world request) but memory gets cleaned in > either case so flusher should be doing what is needed. How about something like the below? It's on top of the latest series, which is in my wb-start-all branch. It handles start_all like the background/kupdate style writeback, reusing the WB_start_all bit for that. On a plane, so untested, but it seems pretty straight forward. It changes the logic a little bit, as the WB_start_all bit isn't cleared until after we're done with a flush-all request. At this point it's truly on inflight at any point in time, not one inflight and one potentially queued. diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 399619c97567..9e24d604c59c 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -53,7 +53,6 @@ struct wb_writeback_work { unsigned int for_background:1; unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */ unsigned int auto_free:1; /* free on completion */ - unsigned int start_all:1; /* nr_pages == 0 (all) writeback */ enum wb_reason reason; /* why was writeback initiated? */ struct list_head list; /* pending work list */ @@ -947,8 +946,6 @@ static unsigned long get_nr_dirty_pages(void) static void wb_start_writeback(struct bdi_writeback *wb, enum wb_reason reason) { - struct wb_writeback_work *work; - if (!wb_has_dirty_io(wb)) return; @@ -958,35 +955,14 @@ static void wb_start_writeback(struct bdi_writeback *wb, enum wb_reason reason) * high frequency, causing pointless allocations of tons of * work items and keeping the flusher threads busy retrieving * that work. Ensure that we only allow one of them pending and - * inflight at the time. It doesn't matter if we race a little - * bit on this, so use the faster separate test/set bit variants. + * inflight at the time. */ - if (test_bit(WB_start_all, &wb->state)) + if (test_bit(WB_start_all, &wb->state) || + test_and_set_bit(WB_start_all, &wb->state)) return; - set_bit(WB_start_all, &wb->state); - - /* - * This is WB_SYNC_NONE writeback, so if allocation fails just - * wakeup the thread for old dirty data writeback - */ - work = kzalloc(sizeof(*work), - GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); - if (!work) { - clear_bit(WB_start_all, &wb->state); - trace_writeback_nowork(wb); - wb_wakeup(wb); - return; - } - - work->sync_mode = WB_SYNC_NONE; - work->nr_pages = wb_split_bdi_pages(wb, get_nr_dirty_pages()); - work->range_cyclic = 1; - work->reason = reason; - work->auto_free = 1; - work->start_all = 1; - - wb_queue_work(wb, work); + wb->start_all_reason = reason; + wb_wakeup(wb); } /** @@ -1838,14 +1814,6 @@ static struct wb_writeback_work *get_next_work_item(struct bdi_writeback *wb) list_del_init(&work->list); } spin_unlock_bh(&wb->work_lock); - - /* - * Once we start processing a work item that had !nr_pages, - * clear the wb state bit for that so we can allow more. - */ - if (work && work->start_all) - clear_bit(WB_start_all, &wb->state); - return work; } @@ -1901,6 +1869,30 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb) return 0; } +static long wb_check_start_all(struct bdi_writeback *wb) +{ + long nr_pages; + + if (!test_bit(WB_start_all, &wb->state)) + return 0; + + nr_pages = get_nr_dirty_pages(); + if (nr_pages) { + struct wb_writeback_work work = { + .nr_pages = wb_split_bdi_pages(wb, nr_pages), + .sync_mode = WB_SYNC_NONE, + .range_cyclic = 1, + .reason = wb->start_all_reason, + }; + + nr_pages = wb_writeback(wb, &work); + } + + clear_bit(WB_start_all, &wb->state); + return nr_pages; +} + + /* * Retrieve work items and do the writeback they describe */ @@ -1917,6 +1909,11 @@ static long wb_do_writeback(struct bdi_writeback *wb) } /* + * Check for a flush-everything request + */ + wrote += wb_check_start_all(wb); + + /* * Check for periodic writeback, kupdated() style */ wrote += wb_check_old_data_flush(wb); diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 420de5c7c7f9..f0f1df29d6b8 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -116,6 +116,7 @@ struct bdi_writeback { struct fprop_local_percpu completions; int dirty_exceeded; + int start_all_reason; spinlock_t work_lock; /* protects work_list & dwork scheduling */ struct list_head work_list; diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h index 9b57f014d79d..19a0ea08e098 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -286,7 +286,6 @@ DEFINE_EVENT(writeback_class, name, \ TP_PROTO(struct bdi_writeback *wb), \ TP_ARGS(wb)) -DEFINE_WRITEBACK_EVENT(writeback_nowork); DEFINE_WRITEBACK_EVENT(writeback_wake_background); TRACE_EVENT(writeback_bdi_register,