From patchwork Wed Aug 30 13:38:22 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Amir Goldstein X-Patchwork-Id: 9929633 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 277C060309 for ; Wed, 30 Aug 2017 13:38:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A8BF285EF for ; Wed, 30 Aug 2017 13:38:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F1CA328615; Wed, 30 Aug 2017 13:38:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92E92285DE for ; Wed, 30 Aug 2017 13:38:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751372AbdH3NiD (ORCPT ); Wed, 30 Aug 2017 09:38:03 -0400 Received: from mail-wr0-f179.google.com ([209.85.128.179]:34820 "EHLO mail-wr0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751317AbdH3NiB (ORCPT ); Wed, 30 Aug 2017 09:38:01 -0400 Received: by mail-wr0-f179.google.com with SMTP id j29so17684346wre.2; Wed, 30 Aug 2017 06:38:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=ke3Av4JaTGE5bUjSTvp/S1zoE8+dSJR4S6TXjPhRgZA=; b=vdux4c6XN9Bgo4mANss1tUsc2KoV/r8OftLNpan4YqzTemubODGXi77YEMlA38C1Sy StbtZ1HQocl6H8aKWFm8nu8ZX9UDXriB3o6PjU9YP2oY6idOZQxN3ADpK+m02PWE8u5W KN1+IAWT/1J0WVLgu+fJY0V2jOYT/N7VPV0vbm8uX8Cbl/odsuWle+jPi5sXxL0yhkPi udnfE53lhPAPWYPtRk3gpA2342yxuAyMZoC65K8szdg9x00VZmAUK7BmmCjfhMnT8t3A twPZ5RGZKdlsdT0pVf8R82dBg2ZcdQSQvg5A357QiNScO80Tx1j4+gid3JTZCnixqMh8 W1ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=ke3Av4JaTGE5bUjSTvp/S1zoE8+dSJR4S6TXjPhRgZA=; b=MfbWef1aYtt5StoGqaGaSes602OXwjnoYcAh9+T+kl7ir90TnNFzXBPtpl8VzQGP2W +eoFd1ddP99fLx+VO5grPRaHxrn40eY+qUH15WiHuSp0kfXj6NXwmogeJkVbZxg+dFc8 oeBjgQhVg7X8GBwzUBS9I5UKKbnw+bCyorBP0DGectCbPZf1Nh4tn2J3CjYO8GIbYUcJ NllLZm+MPlQjugAIIePr8g91KDvf4GSlTAGy+E8iVI8zQx2MVOckkTig9VrMOXh8vh3g k2OwDqKYlSR5Z6Q7E+CF/EYLdZyFORnPQX9bRBv3j7+aOw2/cJoG9tCVu8cs5icg1KHO 8v8Q== X-Gm-Message-State: AHYfb5gDjlC0VKCfQHIljybjxLuvlhSD6LKNbqEisk5Eu57IAMOqFaK6 11IYxXudX59zse88 X-Received: by 10.223.139.156 with SMTP id o28mr1230983wra.188.1504100279872; Wed, 30 Aug 2017 06:37:59 -0700 (PDT) Received: from localhost.localdomain (bzq-166-168-31-246.red.bezeqint.net. [31.168.166.246]) by smtp.gmail.com with ESMTPSA id q201sm4408100wmg.48.2017.08.30.06.37.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 30 Aug 2017 06:37:58 -0700 (PDT) From: Amir Goldstein To: "Darrick J . Wong" , Christoph Hellwig , Dave Chinner Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Josef Bacik , stable@vger.kernel.org Subject: [PATCH] xfs: fix incorrect log_flushed on fsync Date: Wed, 30 Aug 2017 16:38:22 +0300 Message-Id: <1504100302-3297-1-git-send-email-amir73il@gmail.com> X-Mailer: git-send-email 2.7.4 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When calling into _xfs_log_force{,_lsn}() with a pointer to log_flushed variable, log_flushed will be set to 1 if: 1. xlog_sync() is called to flush the active log buffer AND/OR 2. xlog_wait() is called to wait on a syncing log buffers xfs_file_fsync() checks the value of log_flushed after _xfs_log_force_lsn() call to optimize away an explicit PREFLUSH request to the data block device after writing out all the file's pages to disk. This optimization is incorrect in the following sequence of events: Task A Task B ------------------------------------------------------- xfs_file_fsync() _xfs_log_force_lsn() xlog_sync() [submit PREFLUSH] xfs_file_fsync() file_write_and_wait_range() [submit WRITE X] [endio WRITE X] _xfs_log_force_lsn() xlog_wait() [endio PREFLUSH] The write X is not guarantied to be on persistent storage when PREFLUSH request in completed, because write A was submitted after the PREFLUSH request, but xfs_file_fsync() of task A will be notified of log_flushed=1 and will skip explicit flush. If the system crashes after fsync of task A, write X may not be present on disk after reboot. This bug was discovered and demonstrated using Josef Bacik's dm-log-writes target, which can be used to record block io operations and then replay a subset of these operations onto the target device. The test goes something like this: - Use fsx to execute ops of a file and record ops on log device - Every now and then fsync the file, store md5 of file and mark the location in the log - Then replay log onto device for each mark, mount fs and compare md5 of file to stored value Cc: Christoph Hellwig Cc: Josef Bacik Cc: Signed-off-by: Amir Goldstein Reviewed-by: Darrick J. Wong --- Christoph, Dave, It's hard to believe, but I think the reported bug has been around since 2005 f538d4da8d52 ("[XFS] write barrier support"), but I did not try to test old kernels. I tripped over this aleged bug a week ago when I started testing crash consistecy with dm-log-writes: https://marc.info/?l=linux-fsdevel&m=150350333427447&w=2 Since then, I have been trying to prove that dm-log-writes was false accusing, but turned out that it wasn't. The reproducer is an xfstest, which I am going to re-post soon and is also available here: https://github.com/amir73il/xfstests/commits/dm-log-writes On a system with spinning disk it reproduces the issue with close 100% probability within less than a minute. Anyway, with Darrick going on vacation, let's expedite review of this fix and try to merge some fix for v4.14-rc1 (although this bug fix may be eligible to later rc's as well). The change obviously changes behavior for the worse for workload of intensive parallel fsyncing tasks, but I don't see another quick way to fix correctness without hurting this workload. It might be worth to get a feedback from file_write_and_wait_range(), whether dirty pages writes have been issued at all. I started running full xftests cycle, but wanted to send this one out early for comments. I can test other patches if needed. Cheers, Amir. fs/xfs/xfs_log.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 0053bcf..ec159c5 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -3342,8 +3342,6 @@ _xfs_log_force( */ if (iclog->ic_state & XLOG_STATE_IOERROR) return -EIO; - if (log_flushed) - *log_flushed = 1; } else { no_sleep: @@ -3447,8 +3445,6 @@ _xfs_log_force_lsn( xlog_wait(&iclog->ic_prev->ic_write_wait, &log->l_icloglock); - if (log_flushed) - *log_flushed = 1; already_slept = 1; goto try_again; } @@ -3482,9 +3478,6 @@ _xfs_log_force_lsn( */ if (iclog->ic_state & XLOG_STATE_IOERROR) return -EIO; - - if (log_flushed) - *log_flushed = 1; } else { /* just return */ spin_unlock(&log->l_icloglock); }