From patchwork Thu Nov 8 22:19:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 10675073 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E11E1709 for ; Thu, 8 Nov 2018 22:19:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1099D2B0FE for ; Thu, 8 Nov 2018 22:19:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 04B232E40F; Thu, 8 Nov 2018 22:19:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SUBJ_OBFU_PUNCT_FEW autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9AA492B0FE for ; Thu, 8 Nov 2018 22:19:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731299AbeKIH4x (ORCPT ); Fri, 9 Nov 2018 02:56:53 -0500 Received: from ipmail03.adl2.internode.on.net ([150.101.137.141]:20088 "EHLO ipmail03.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728311AbeKIH4x (ORCPT ); Fri, 9 Nov 2018 02:56:53 -0500 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail03.adl2.internode.on.net with ESMTP; 09 Nov 2018 08:49:15 +1030 Received: from discord.disaster.area ([192.168.1.111]) by dastard with esmtp (Exim 4.80) (envelope-from ) id 1gKsdr-00023k-Mu; Fri, 09 Nov 2018 09:19:11 +1100 Received: from dave by discord.disaster.area with local (Exim 4.91) (envelope-from ) id 1gKsdr-0008IO-KD; Fri, 09 Nov 2018 09:19:11 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Subject: [PATCH 0/2]: dedupe/copy_file_range fixes Date: Fri, 9 Nov 2018 09:19:07 +1100 Message-Id: <20181108221909.27602-1-david@fromorbit.com> X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi folks, Here's the first couple of fixes that fell out of the tree last night after Darrick added and clone/dedupe_file_range to fsx and copy_file_range to both fsx and fsstress yesterday. There will be more coming. FYI, do_splice_direct() is broken for direct IO on XFS. It's been broken ever since the iomap infrastructure was introduced in 4.10 two years ago. Any code that can use splice to read data via direct IO is definitely broken (copy_file_range demonstrates that). And my initial testing on 64k block sizes indicates that direct IO writes via splice are also broken because the short writes expose stale data in partially written blocks. As such, copy_file_range() is broken for direct IO on XFS w/o reflink support (i.e. default configs). It has been broken since at least 4.10, and maybe as far back as when copy_file_range() was first introduced in 4.5. Yet in all this time, nobody has noticed that is is broken. That's because nobody added test infrastructure support for the copy_file_range() syscall. Sure we added a command to xfs_io to be able to copy files via the interface, but that doesn't mean the interface is exercised in any useful manner. So, how long do you think it took fsx to find these problems? A few hours? A few million operations? Nope. It took 32 operations and 0.3s - it failed on the second copy_file_range() call it made. This is from generic/263: skipping zero size read truncating to largest ever: 0xe400 4 trunc from 0x0 to 0xe400 5 insert from 0x6000 to 0x18000, (0x12000 bytes) 6 zero from 0x91be to 0x1edf6, (0x15c38 bytes) 7 write 0x3ac00 thru 0x3cdff (0x2200 bytes) 8 mapread 0x36000 thru 0x3be19 (0x5e1a bytes) 9 mapwrite 0x73200 thru 0x7928c (0x608d bytes) 10 mapread 0x3d000 thru 0x3f9c2 (0x29c3 bytes) 11 collapse from 0x2b000 to 0x45000, (0x1a000 bytes) 12 punch from 0x495fa to 0x5f28d, (0x15c93 bytes) 13 falloc from 0x2f42a to 0x4a8f4 (0x1b4ca bytes) 14 zero from 0x530b7 to 0x5f28d, (0xc1d6 bytes) 15 mapwrite 0x55e00 thru 0x70d6e (0x1af6f bytes) 16 read 0x2e000 thru 0x38fff (0xb000 bytes) 17 collapse from 0x3f000 to 0x4f000, (0x10000 bytes) 18 copy from 0x28000 to 0x43000, (0x1b000 bytes) at 0x4400 19 collapse from 0x2c000 to 0x45000, (0x19000 bytes) 20 write 0x54a00 thru 0x709ff (0x1c000 bytes) 21 read 0x53000 thru 0x69fff (0x17000 bytes) 22 mapwrite 0x1f200 thru 0x394bb (0x1a2bc bytes) 23 mapread 0x43000 thru 0x5a2d8 (0x172d9 bytes) 24 mapwrite 0x23000 thru 0x38812 (0x15813 bytes) 25 write 0x47800 thru 0x587ff (0x11000 bytes) 26 clone from 0x3000 to 0x12000, (0xf000 bytes) at 0x61000 skipping unsupported clone range 27 read 0x6c000 thru 0x6efff (0x3000 bytes) 28 dedupe from 0x12000 to 0x1e000, (0xc000 bytes) at 0x4000 skipping unsupported dedupe range 29 insert from 0x31000 to 0x33000, (0x2000 bytes) fallocating to largest ever: 0x49915 30 falloc from 0x2deac to 0x49915 (0x1ba69 bytes) 31 dedupe from 0x6f000 to 0x72000, (0x3000 bytes) at 0x25000 skipping unsupported dedupe range 32 copy from 0x4b000 to 0x64000, (0x19000 bytes) at 0x2800 copy range: 0x4b000 to 0x64000 at 0x2800 do_copy_range:: Resource temporarily unavailable The ease of reproducing this bug indicates just how badly our engineering processes have failed here. They simply haven't been followed at all - we know that if we don't add test coverage with new features and/or interfaces, then the new code will be buggy and broken. Yet maintainers still seem to ignore whether code is testable or even fit for purpose and merge new code that has not and cannot be tested adequately now and into the future. I'm not a happy camper right now. -Dave.