From patchwork Fri Oct 10 14:23:41 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 5065831 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 2BD63C11AC for ; Fri, 10 Oct 2014 14:25:55 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 49A0820222 for ; Fri, 10 Oct 2014 14:25:53 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 183D82021A for ; Fri, 10 Oct 2014 14:25:52 +0000 (UTC) Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s9AEPSeZ014795 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 10 Oct 2014 14:25:28 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s9AEPRNE010371 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 10 Oct 2014 14:25:27 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Xcb8J-0001EO-FJ; Fri, 10 Oct 2014 07:25:27 -0700 Received: from acsinet21.oracle.com ([141.146.126.237]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Xcb75-00013F-9y for ocfs2-devel@oss.oracle.com; Fri, 10 Oct 2014 07:24:11 -0700 Received: from aserp1020.oracle.com (aserp1020.oracle.com [141.146.126.67]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s9AEOAEv006823 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 10 Oct 2014 14:24:11 GMT Received: from aserp2030.oracle.com (aserp2030.oracle.com [141.146.126.74]) by aserp1020.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s9AEOAuS009346 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 10 Oct 2014 14:24:10 GMT Received: from pps.filterd (aserp2030.oracle.com [127.0.0.1]) by aserp2030.oracle.com (8.14.7/8.14.7) with SMTP id s9AENidj008993 for ; Fri, 10 Oct 2014 14:24:10 GMT Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) by aserp2030.oracle.com with ESMTP id 1pxb5c57vh-1 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NOT) for ; Fri, 10 Oct 2014 14:24:10 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 5B43EADF3; Fri, 10 Oct 2014 14:24:04 +0000 (UTC) Received: by quack.suse.cz (Postfix, from userid 1000) id 0731482049; Fri, 10 Oct 2014 16:24:01 +0200 (CEST) From: Jan Kara To: linux-fsdevel@vger.kernel.org Date: Fri, 10 Oct 2014 16:23:41 +0200 Message-Id: <1412951028-4085-37-git-send-email-jack@suse.cz> X-Mailer: git-send-email 1.8.1.4 In-Reply-To: <1412951028-4085-1-git-send-email-jack@suse.cz> References: <1412951028-4085-1-git-send-email-jack@suse.cz> X-ServerName: cantor2.suse.de X-Proofpoint-Virus-Version: vendor=nai engine=5600 definitions=7586 signatures=670543 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=1 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1410100121 Cc: Dave Kleikamp , jfs-discussion@lists.sourceforge.net, tytso@mit.edu, Jeff Mahoney , Mark Fasheh , Dave Chinner , reiserfs-devel@vger.kernel.org, xfs@oss.sgi.com, cluster-devel@redhat.com, Dave Chinner , Jan Kara , linux-ext4@vger.kernel.org, Steven Whitehouse , ocfs2-devel@oss.oracle.com, viro@zeniv.linux.org.uk Subject: [Ocfs2-devel] [PATCH] writeback: plug writeback at a high level X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet21.oracle.com [141.146.126.237] X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner tl;dr: 3 lines of code, 86% better fsmark thoughput consuming 13% less CPU and 43% lower runtime. Doing writeback on lots of little files causes terrible IOPS storms because of the per-mapping writeback plugging we do. This essentially causes imeediate dispatch of IO for each mapping, regardless of the context in which writeback is occurring. IOWs, running a concurrent write-lots-of-small 4k files using fsmark on XFS results in a huge number of IOPS being issued for data writes. Metadata writes are sorted and plugged at a high level by XFS, so aggregate nicely into large IOs. However, data writeback IOs are dispatched in individual 4k IOs - even when the blocks of two consecutively written files are adjacent - because the underlying block device is fast enough not to congest on such IO. This behaviour is not SSD related - anything with hardware caches is going to see the same benefits as the IO rates are limited only by how fast adjacent IOs can be sent to the hardware caches for aggregation. Hence the speed of the physical device is irrelevant to this common writeback workload (happens every time you untar a tarball!) - performance is limited by the overhead of dispatching individual IOs from a single writeback thread. Test VM: 16p, 16GB RAM, 2xSSD in RAID0, 500TB sparse XFS filesystem, metadata CRCs enabled. Test: $ ./fs_mark -D 10000 -S0 -n 10000 -s 4096 -L 120 -d /mnt/scratch/0 -d /mnt/scratch/1 -d /mnt/scratch/2 -d /mnt/scratch/3 -d /mnt/scratch/4 -d /mnt/scratch/5 -d /mnt/scratch/6 -d /mnt/scratch/7 Result: wall sys create rate Physical write IO time CPU (avg files/s) IOPS Bandwidth ----- ----- ------------- ------ --------- unpatched 5m54s 15m32s 32,500+/-2200 28,000 150MB/s patched 3m19s 13m28s 52,900+/-1800 1,500 280MB/s improvement -43.8% -13.3% +62.7% -94.6% +86.6% Signed-off-by: Dave Chinner Signed-off-by: Jan Kara --- fs/fs-writeback.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 279292ba9403..d935fd3796ba 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -599,6 +599,9 @@ static long generic_writeback_inodes(struct wb_writeback_work *work) unsigned long end_time = jiffies + HZ / 10; long write_chunk; long wrote = 0; /* count both pages and inodes */ + struct blk_plug plug; + + blk_start_plug(&plug); spin_lock(&wb->list_lock); while (1) { @@ -688,6 +691,8 @@ static long generic_writeback_inodes(struct wb_writeback_work *work) out: spin_unlock(&wb->list_lock); + blk_finish_plug(&plug); + return wrote; }