From patchwork Wed Jun 20 19:02:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10478497 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4E930601D7 for ; Wed, 20 Jun 2018 19:02:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B72428DCE for ; Wed, 20 Jun 2018 19:02:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3028828DDD; Wed, 20 Jun 2018 19:02:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B92A28DCE for ; Wed, 20 Jun 2018 19:02:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754682AbeFTTCw (ORCPT ); Wed, 20 Jun 2018 15:02:52 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:53916 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754521AbeFTTCv (ORCPT ); Wed, 20 Jun 2018 15:02:51 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w5KIsxug121342; Wed, 20 Jun 2018 19:02:33 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2017-10-26; bh=PQPlyK/3NQFN5fntgjDx0oTQCx5rjrGgIoXhp8FDASs=; b=og0U0zQh/Gz7X9I1ZzEeArrLe75sIuB77ynTUlimkOjtHuftccJh4euXgN8wkClltYrs bA9fK27ZCeSd3YgJSqmOpVznawIZvf5ESaVGVBgK0KOmRz2DJy09xhfzDa7ND5WcMsug y70Xv1OUdaUEKxNHd7sSX7ThClrortikRA+U2Klyq6La30eqzREfG79tcm8WCJ+GFPLN mMLGIR0Vd7yQl1nsmuMhfNlR63NojH1PVreuU/hYpmydBMuIE1uVa0h5TDbV5mB3KkQ3 LNk9Tqm4pilPBgE3ujwepldYAiRfyW+nSTE17Daj/h2UE4eAEfvG9sLBMi1Rn6nB/R9A rw== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2120.oracle.com with ESMTP id 2jmu6xwsy2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Jun 2018 19:02:33 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w5KJ2WBh025243 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 20 Jun 2018 19:02:32 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w5KJ2Vku000895; Wed, 20 Jun 2018 19:02:32 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 20 Jun 2018 12:02:31 -0700 Date: Wed, 20 Jun 2018 12:02:30 -0700 From: "Darrick J. Wong" To: Brian Foster Cc: Christoph Hellwig , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 23/24] iomap: add support for sub-pagesize buffered I/O without buffer heads Message-ID: <20180620190230.GB4838@magnolia> References: <20180615130209.1970-1-hch@lst.de> <20180615130209.1970-24-hch@lst.de> <20180619165211.GD2806@bfoster> <20180620075655.GA2668@lst.de> <20180620143252.GE3241@bfoster> <20180620160803.GA4838@magnolia> <20180620181259.GD4493@bfoster> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20180620181259.GD4493@bfoster> User-Agent: Mutt/1.9.4 (2018-02-28) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8930 signatures=668702 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1805220000 definitions=main-1806200208 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Jun 20, 2018 at 02:12:59PM -0400, Brian Foster wrote: > On Wed, Jun 20, 2018 at 09:08:03AM -0700, Darrick J. Wong wrote: > > On Wed, Jun 20, 2018 at 10:32:53AM -0400, Brian Foster wrote: > > > Sending again without the attachment... Christoph, let me know if it > > > didn't hit your mbox at least. > > > > > > On Wed, Jun 20, 2018 at 09:56:55AM +0200, Christoph Hellwig wrote: > > > > On Tue, Jun 19, 2018 at 12:52:11PM -0400, Brian Foster wrote: > > > > > > + /* > > > > > > + * Move the caller beyond our range so that it keeps making progress. > > > > > > + * For that we have to include any leading non-uptodate ranges, but > > > > > > > > > > Do you mean "leading uptodate ranges" here? E.g., pos is pushed forward > > > > > past those ranges we don't have to read, so (pos - orig_pos) reflects > > > > > the initial uptodate range while plen reflects the length we have to > > > > > read..? > > > > > > > > Yes. > > > > > > > > > > + > > > > > > + do { > > > > > > > > > > Kind of a nit, but this catches my eye and manages to confuse me every > > > > > time I look at it. A comment along the lines of: > > > > > > > > > > /* > > > > > * Pass in the block aligned start/end so we get back block > > > > > * aligned/adjusted poff/plen and can compare with unaligned > > > > > * from/to below. > > > > > */ > > > > > > > > > > ... would be nice here, IMO. > > > > > > > > Fine with me. > > > > > > > > > > + iomap_adjust_read_range(inode, iop, &block_start, > > > > > > + block_end - block_start, &poff, &plen); > > > > > > + if (plen == 0) > > > > > > + break; > > > > > > + > > > > > > + if ((from > poff && from < poff + plen) || > > > > > > + (to > poff && to < poff + plen)) { > > > > > > + status = iomap_read_page_sync(inode, block_start, page, > > > > > > + poff, plen, from, to, iomap); > > > > > > > > > > After taking another look at the buffer head path, it does look like we > > > > > have slightly different behavior here. IIUC, the former reads only the > > > > > !uptodate blocks that fall along the from/to boundaries. Here, if say > > > > > from = 1, to = PAGE_SIZE and the page is fully !uptodate, it looks like > > > > > we'd read the entire page worth of blocks (assuming contiguous 512b > > > > > blocks, for example). Intentional? Doesn't seem like a big deal, but > > > > > could be worth a followup fix. > > > > > > > > It wasn't actuall intentional, but I actually think it is the right thing > > > > in then end, as it means we'll often do a single read instead of two > > > > separate ones. > > > > > > Ok, but if that's the argument, then shouldn't we not be doing two > > > separate I/Os if the middle range of a write happens to be already > > > uptodate? Or more for that matter, if the page happens to be sparsely > > > uptodate for whatever reason..? > > > > > > OTOH, I also do wonder a bit whether that may always be the right thing > > > if we consider cases like 64k page size arches and whatnot. It seems > > > like we could end up consuming more bandwidth for reads than we > > > typically have in the past. That said, unless there's a functional > > > reason to change this I think it's fine to optimize this path for these > > > kinds of corner cases in follow on patches. > > > > > > Finally, this survived xfstests on a sub-page block size fs but I > > > managed to hit an fsx error: > > > > > > Mapped Read: non-zero data past EOF (0x21a1f) page offset 0xc00 is > > > 0xc769 > > > > > > It repeats 100% of the time for me using the attached fsxops file (with > > > --replay-ops) on XFS w/ -bsize=1k. It doesn't occur without the final > > > patch to enable sub-page block iomap on XFS. > > > > Funny, because I saw the exact same complaint from generic/127 last > > night on my development tree that doesn't include hch's patches and was > > going to see if I could figure out what's going on. > > > > FWIW it's been happening sporadically for a few weeks now but every time > > I've tried to analyze it I (of course) couldn't get it to reproduce. :) > > > > I also ran this series (all of it, including the subpagesize config) > > last night and aside from it stumbling over an unrelated locking problem > > seemed fine.... > > > > That's interesting. Perhaps it's a pre-existing issue in that case and > the iomap stuff just changes the timing to make it reliably reproducible > on this particular system. > > I only ran it a handful of times in both cases and now have lost access > to the server. Once I regain access, I'll try running for longer on > for-next to see if the same thing eventually triggers. I managed to cut the testcase down to a nine-line fsx script and so turned it into a fstests regression case. It seems to reproduce 100% on scsi disks and doesn't at all on pmem. Note that changing the second to last line of the fsxops script to call punch_hole instead of zero_range triggers it too. I've also narrowed it down to something going wrong w.r.t. handling the page cache somewhere under xfs_free_file_space. (See attached diff...) --D generic: mread past eof shows nonzero contents Certain sequences of generic/127 invocations complain about being able to mread nonzero contents past eof. Replicate that here as a regression test. Signed-off-by: Darrick J. Wong --- tests/generic/708 | 54 +++++++++++++++++++++++++++++++++++++++++++++++++ tests/generic/708.out | 2 ++ tests/generic/group | 1 + 3 files changed, 57 insertions(+) create mode 100755 tests/generic/708 create mode 100644 tests/generic/708.out diff --git a/tests/generic/708 b/tests/generic/708 new file mode 100755 index 00000000..fa5584f5 --- /dev/null +++ b/tests/generic/708 @@ -0,0 +1,54 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2018 Oracle. All Rights Reserved. +# +# FS QA Test No. 708 +# +# Test a specific sequence of fsx operations that causes an mmap read past +# eof to return nonzero contents. +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_scratch + +rm -f $seqres.full + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +cat >> $tmp.fsxops << ENDL +fallocate 0x77e2 0x5f06 0x269a2 keep_size +mapwrite 0x2e7fc 0x42ba 0x3f989 +write 0x67a9 0x714e 0x3f989 +write 0x39f96 0x185a 0x3f989 +collapse_range 0x36000 0x8000 0x3f989 +mapread 0x74c0 0x1bb3 0x3e2d0 +truncate 0x0 0x8aa2 0x3e2d0 +zero_range 0x1265 0x783d 0x8aa2 +mapread 0x7bd8 0xeca 0x8aa2 +ENDL + +victim=$SCRATCH_MNT/a +touch $victim +$here/ltp/fsx --replay-ops $tmp.fsxops $victim > $tmp.output || cat $tmp.output + +echo "Silence is golden" +status=0 +exit diff --git a/tests/generic/708.out b/tests/generic/708.out new file mode 100644 index 00000000..33c478ad --- /dev/null +++ b/tests/generic/708.out @@ -0,0 +1,2 @@ +QA output created by 708 +Silence is golden diff --git a/tests/generic/group b/tests/generic/group index 83a6fdab..1a1a0a6e 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -501,3 +501,4 @@ 496 auto quick swap 497 auto quick swap collapse 498 auto quick log +708 auto quick rw collapse