[0/2] fs: Hole punch vs page cache filling races

Message ID	20190603132155.20600-1-jack@suse.cz (mailing list archive)
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Jan Kara <jack@suse.cz> To: <linux-ext4@vger.kernel.org> Cc: Ted Tso <tytso@mit.edu>, <linux-mm@kvack.org>, <linux-fsdevel@vger.kernel.org>, Amir Goldstein <amir73il@gmail.com>, Jan Kara <jack@suse.cz> Subject: [PATCH 0/2] fs: Hole punch vs page cache filling races Date: Mon, 3 Jun 2019 15:21:53 +0200 Message-Id: <20190603132155.20600-1-jack@suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	fs: Hole punch vs page cache filling races \| expand [0/2] fs: Hole punch vs page cache filling races [1/2] mm: Add readahead file operation [2/2] ext4: Fix stale data exposure when read races with hole punch

Message ID

20190603132155.20600-1-jack@suse.cz (mailing list archive)

Headers

From: Jan Kara <jack@suse.cz>
To: <linux-ext4@vger.kernel.org>
Cc: Ted Tso <tytso@mit.edu>, <linux-mm@kvack.org>,
        <linux-fsdevel@vger.kernel.org>,
        Amir Goldstein <amir73il@gmail.com>, Jan Kara <jack@suse.cz>
Subject: [PATCH 0/2] fs: Hole punch vs page cache filling races
Date: Mon,  3 Jun 2019 15:21:53 +0200
Message-Id: <20190603132155.20600-1-jack@suse.cz>
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Series

fs: Hole punch vs page cache filling races | expand

Message

Jan Kara June 3, 2019, 1:21 p.m. UTC

Hello,

Amir has reported a that ext4 has a potential issues when reads can race with
hole punching possibly exposing stale data from freed blocks or even corrupting
filesystem when stale mapping data gets used for writeout. The problem is that
during hole punching, new page cache pages can get instantiated in a punched
range after truncate_inode_pages() has run but before the filesystem removes
blocks from the file.  In principle any filesystem implementing hole punching
thus needs to implement a mechanism to block instantiating page cache pages
during hole punching to avoid this race. This is further complicated by the
fact that there are multiple places that can instantiate pages in page cache.
We can have regular read(2) or page fault doing this but fadvise(2) or
madvise(2) can also result in reading in page cache pages through
force_page_cache_readahead().

This patch set fixes the problem for ext4 by protecting all page cache filling
opearation with EXT4_I(inode)->i_mmap_lock. To be able to do that for
readahead, we introduce new ->readahead file operation and corresponding
vfs_readahead() helper. Note that e.g. ->readpages() cannot be used for getting
the appropriate lock - we also need to protect ordinary read path using
->readpage() and there's no way to distinguish ->readpages() called through
->read_iter() from ->readpages() called e.g. through fadvise(2).

Other filesystems (e.g. XFS, F2FS, GFS2, OCFS2, ...) need a similar fix. I can
write some (e.g. for XFS) once we settle that ->readahead operation is indeed a
way to fix this.

								Honza

[1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com/