iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()"
diff mbox series

Message ID 20181220122324.17082-1-david@fromorbit.com
State New
Headers show
Series
  • iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()"
Related show

Commit Message

Dave Chinner Dec. 20, 2018, 12:23 p.m. UTC
From: Dave Chinner <dchinner@redhat.com>

This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.

The reverted commit added page reference counting to iomap page
structures that are used to track block size < page size state. This
was supposed to align the code with page migration page accounting
assumptions, but what it has done instead is break XFS filesystems.
Every fstests run I've done on sub-page block size XFS filesystems
has since picking up this commit 2 days ago has failed with bad page
state errors such as:

# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
....
SECTION       -- xfs
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 test1 4.20.0-rc6-dgc+
MKFS_OPTIONS  -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch

generic/038 454s ...
 run fstests generic/038 at 2018-12-20 18:43:05
 XFS (sdc): Unmounting Filesystem
 XFS (sdc): Mounting V5 Filesystem
 XFS (sdc): Ending clean mount
 BUG: Bad page state in process kswapd0  pfn:3a7fa
 page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
 flags: 0xfffffc0000000()
 raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
 raw: 0000000000000001 0000000000000000 00000000ffffffff
 page dumped because: non-NULL mapping
 CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
 Call Trace:
  dump_stack+0x67/0x90
  bad_page.cold.116+0x8a/0xbd
  free_pcppages_bulk+0x4bf/0x6a0
  free_unref_page_list+0x10f/0x1f0
  shrink_page_list+0x49d/0xf50
  shrink_inactive_list+0x19d/0x3b0
  shrink_node_memcg.constprop.77+0x398/0x690
  ? shrink_slab.constprop.81+0x278/0x3f0
  shrink_node+0x7a/0x2f0
  kswapd+0x34b/0x6d0
  ? node_reclaim+0x240/0x240
  kthread+0x11f/0x140
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x24/0x30
 Disabling lock debugging due to kernel taint
....

The failures are from anyway that frees pages and empties the
per-cpu page magazines, so it's not a predictable failure or an easy
to debug failure.

generic/038 is a reliable reproducer of this problem - it has a 9 in
10 failure rate on one of my test machines. Failure on other
machines have been at random points in fstests runs but every run
has ended up tripping this problem. Hence generic/038 was used to
bisect the failure because it was the most reliable failure.

It is too close to the 4.20 release (not to mention holidays) to
try to diagnose, fix and test the underlying cause of the problem,
so reverting the commit is the only option we have right now. The
revert has been tested against a current tot 4.20-rc7+ kernel across
multiple machines running sub-page block size XFs filesystems and
none of the bad page state failures have been seen.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/iomap.c | 7 -------
 1 file changed, 7 deletions(-)

Comments

Greg Kroah-Hartman Dec. 20, 2018, 12:28 p.m. UTC | #1
On Thu, Dec 20, 2018 at 11:23:24PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
> 
> The reverted commit added page reference counting to iomap page
> structures that are used to track block size < page size state. This
> was supposed to align the code with page migration page accounting
> assumptions, but what it has done instead is break XFS filesystems.
> Every fstests run I've done on sub-page block size XFS filesystems
> has since picking up this commit 2 days ago has failed with bad page
> state errors such as:
> 
> # ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
> ....
> SECTION       -- xfs
> FSTYP         -- xfs (debug)
> PLATFORM      -- Linux/x86_64 test1 4.20.0-rc6-dgc+
> MKFS_OPTIONS  -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
> MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
> 
> generic/038 454s ...
>  run fstests generic/038 at 2018-12-20 18:43:05
>  XFS (sdc): Unmounting Filesystem
>  XFS (sdc): Mounting V5 Filesystem
>  XFS (sdc): Ending clean mount
>  BUG: Bad page state in process kswapd0  pfn:3a7fa
>  page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
>  flags: 0xfffffc0000000()
>  raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
>  raw: 0000000000000001 0000000000000000 00000000ffffffff
>  page dumped because: non-NULL mapping
>  CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
>  Call Trace:
>   dump_stack+0x67/0x90
>   bad_page.cold.116+0x8a/0xbd
>   free_pcppages_bulk+0x4bf/0x6a0
>   free_unref_page_list+0x10f/0x1f0
>   shrink_page_list+0x49d/0xf50
>   shrink_inactive_list+0x19d/0x3b0
>   shrink_node_memcg.constprop.77+0x398/0x690
>   ? shrink_slab.constprop.81+0x278/0x3f0
>   shrink_node+0x7a/0x2f0
>   kswapd+0x34b/0x6d0
>   ? node_reclaim+0x240/0x240
>   kthread+0x11f/0x140
>   ? __kthread_bind_mask+0x60/0x60
>   ret_from_fork+0x24/0x30
>  Disabling lock debugging due to kernel taint
> ....
> 
> The failures are from anyway that frees pages and empties the
> per-cpu page magazines, so it's not a predictable failure or an easy
> to debug failure.
> 
> generic/038 is a reliable reproducer of this problem - it has a 9 in
> 10 failure rate on one of my test machines. Failure on other
> machines have been at random points in fstests runs but every run
> has ended up tripping this problem. Hence generic/038 was used to
> bisect the failure because it was the most reliable failure.
> 
> It is too close to the 4.20 release (not to mention holidays) to
> try to diagnose, fix and test the underlying cause of the problem,
> so reverting the commit is the only option we have right now. The
> revert has been tested against a current tot 4.20-rc7+ kernel across
> multiple machines running sub-page block size XFs filesystems and
> none of the bad page state failures have been seen.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/iomap.c | 7 -------
>  1 file changed, 7 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>
Christoph Hellwig Dec. 20, 2018, 5:41 p.m. UTC | #2
Lets fix the problem instead.  I'll dig into it.
Dave Chinner Dec. 20, 2018, 8:53 p.m. UTC | #3
On Thu, Dec 20, 2018 at 06:41:30PM +0100, Christoph Hellwig wrote:
> Lets fix the problem instead.  I'll dig into it.

I'd say we still need to revert this for 4.20 which is due for
release in a couple of days. Most of us are going to be on holidays
and away from computers for the next week, so I don't think we
have time to adequately review and test a new fix (which doesn't
exist yet) before the release occurs.

Cheers,

Dave.
Dave Chinner Dec. 20, 2018, 8:57 p.m. UTC | #4
On Thu, Dec 20, 2018 at 01:28:25PM +0100, Greg KH wrote:
> On Thu, Dec 20, 2018 at 11:23:24PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.

<snip>

> <formletter>
> 
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read:
>     https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> for how to do this properly.
> 
> </formletter>

This wasn't a stable kernel patch submission. This was to let the
stable kernel maintainers know that a commit that was submitted for
stable kernel inclusion (via a cc stable in the commit message) is
buggy and should not be backported. i.e. if you've already queued/
backported 61c6de6672 then you need to drop it from your trees.

Cheers,

Dave.
Linus Torvalds Dec. 20, 2018, 10:42 p.m. UTC | #5
On Thu, Dec 20, 2018 at 12:53 PM Dave Chinner <david@fromorbit.com> wrote:
>
> I'd say we still need to revert this for 4.20 which is due for
> release in a couple of days. Most of us are going to be on holidays
> and away from computers for the next week, so I don't think we
> have time to adequately review and test a new fix (which doesn't
> exist yet) before the release occurs.

Yeah. I already applied the revert. If somebody finds a "duh" moment,
and an alternate fix gets posted and tested we can revert the revert
and fix it properly, but for 4.20 (and for xmas) I do think that just
going back to the previous state is otherwise the right choice.

              Linus
Greg Kroah-Hartman Dec. 21, 2018, 7:40 a.m. UTC | #6
On Fri, Dec 21, 2018 at 07:57:21AM +1100, Dave Chinner wrote:
> On Thu, Dec 20, 2018 at 01:28:25PM +0100, Greg KH wrote:
> > On Thu, Dec 20, 2018 at 11:23:24PM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
> 
> <snip>
> 
> > <formletter>
> > 
> > This is not the correct way to submit patches for inclusion in the
> > stable kernel tree.  Please read:
> >     https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > for how to do this properly.
> > 
> > </formletter>
> 
> This wasn't a stable kernel patch submission. This was to let the
> stable kernel maintainers know that a commit that was submitted for
> stable kernel inclusion (via a cc stable in the commit message) is
> buggy and should not be backported. i.e. if you've already queued/
> backported 61c6de6672 then you need to drop it from your trees.

As the commit 61c6de667263 ("fs/iomap.c: get/put the page in
iomap_page_create/release()") was tagged for a stable release (and is in
4.19.11) it would be nice to also add the cc: stable in the revert.
When you just blindly send it to the stable mailing list, we have no
idea what to do with it, hence the automated form letter that was sent.

Now that this is in Linus's tree, I'll dig out the revert and remember
to queue it up for the next round of stable updates, thanks for letting
us know.

greg k-h
Christoph Hellwig Dec. 21, 2018, 9:39 a.m. UTC | #7
On Thu, Dec 20, 2018 at 02:42:15PM -0800, Linus Torvalds wrote:
> Yeah. I already applied the revert. If somebody finds a "duh" moment,
> and an alternate fix gets posted and tested we can revert the revert
> and fix it properly, but for 4.20 (and for xmas) I do think that just
> going back to the previous state is otherwise the right choice.

As far as I can tell the commit simply missed updating the recounts
in the actual migrate callback for which it was added.  The fixed
version looks something like this (still running xfstests):

diff --git a/fs/iomap.c b/fs/iomap.c
index d6bc98ae8d35..20c9c1cadd4e 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -116,6 +116,12 @@ iomap_page_create(struct inode *inode, struct page *page)
 	atomic_set(&iop->read_count, 0);
 	atomic_set(&iop->write_count, 0);
 	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
+
+	/*
+	 * migrate_page_move_mapping() assumes that pages with private data have
+	 * their count elevated by 1.
+	 */
+	get_page(page);
 	set_page_private(page, (unsigned long)iop);
 	SetPagePrivate(page);
 	return iop;
@@ -132,6 +138,7 @@ iomap_page_release(struct page *page)
 	WARN_ON_ONCE(atomic_read(&iop->write_count));
 	ClearPagePrivate(page);
 	set_page_private(page, 0);
+	put_page(page);
 	kfree(iop);
 }
 
@@ -556,8 +563,10 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
 
 	if (page_has_private(page)) {
 		ClearPagePrivate(page);
+		get_page(newpage);
 		set_page_private(newpage, page_private(page));
 		set_page_private(page, 0);
+		put_page(page);
 		SetPagePrivate(newpage);
 	}
Linus Torvalds Dec. 21, 2018, 7:18 p.m. UTC | #8
Dave,
 any chance you can validate Christoph's patch and we could re-apply the fix?

             Linus

On Fri, Dec 21, 2018 at 1:39 AM Christoph Hellwig <hch@lst.de> wrote:
>
> As far as I can tell the commit simply missed updating the recounts
> in the actual migrate callback for which it was added.  The fixed
> version looks something like this (still running xfstests):
Christoph Hellwig Dec. 21, 2018, 7:20 p.m. UTC | #9
On Fri, Dec 21, 2018 at 11:18:35AM -0800, Linus Torvalds wrote:
> Dave,
>  any chance you can validate Christoph's patch and we could re-apply the fix?

I think Dave being in Australia is probably off to his well earned
weekend, like I should here in Europe now :)

Given that it took so long to find the original issue I think we should
not worry about 4.20 and just get the fix into 4.21-rc early and add it
to stable.
Linus Torvalds Dec. 21, 2018, 7:26 p.m. UTC | #10
On Fri, Dec 21, 2018 at 11:20 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Given that it took so long to find the original issue I think we should
> not worry about 4.20 and just get the fix into 4.21-rc early and add it
> to stable.

Fair enough, I'll stop worrying ;)

             Linus
Dave Chinner Dec. 21, 2018, 9:58 p.m. UTC | #11
On Fri, Dec 21, 2018 at 11:18:35AM -0800, Linus Torvalds wrote:
> Dave,
>  any chance you can validate Christoph's patch and we could re-apply the fix?

Not for a few days - I'm out of the office until after xmas now.

I did notice the same bug in the original patch yesterday (and
mentioned it to Darrick on #xfs) when doing post-mortem analysis,
but I couldn't tell if that was the only problem all my test
machines were tied up testing that the revert didn't
introduce/uncover any new problems...

I'll test it as soon as I'm back around the office so we can get it
in -rc1 if there's no problems.

Cheers,

Dave.
Eric Sandeen Dec. 21, 2018, 10:06 p.m. UTC | #12
On 12/21/18 1:40 AM, Greg KH wrote:
> On Fri, Dec 21, 2018 at 07:57:21AM +1100, Dave Chinner wrote:
>> On Thu, Dec 20, 2018 at 01:28:25PM +0100, Greg KH wrote:
>>> On Thu, Dec 20, 2018 at 11:23:24PM +1100, Dave Chinner wrote:
>>>> From: Dave Chinner <dchinner@redhat.com>
>>>>
>>>> This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
>>
>> <snip>
>>
>>> <formletter>
>>>
>>> This is not the correct way to submit patches for inclusion in the
>>> stable kernel tree.  Please read:
>>>     https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>>> for how to do this properly.
>>>
>>> </formletter>
>>
>> This wasn't a stable kernel patch submission. This was to let the
>> stable kernel maintainers know that a commit that was submitted for
>> stable kernel inclusion (via a cc stable in the commit message) is
>> buggy and should not be backported. i.e. if you've already queued/
>> backported 61c6de6672 then you need to drop it from your trees.
> 
> As the commit 61c6de667263 ("fs/iomap.c: get/put the page in
> iomap_page_create/release()") was tagged for a stable release (and is in
> 4.19.11) it would be nice to also add the cc: stable in the revert.
> When you just blindly send it to the stable mailing list, we have no
> idea what to do with it, hence the automated form letter that was sent.
> 
> Now that this is in Linus's tree, I'll dig out the revert and remember
> to queue it up for the next round of stable updates, thanks for letting
> us know.

As a side note from an interested observer, I had no idea that patches
from not-yet-released Linus kernels could make it into a "stable" kernel
within 5 days of Linus' commit to an -rc.  That seems shockingly fast,
with very little margin for error.

-Eric
Sasha Levin Dec. 22, 2018, 12:04 a.m. UTC | #13
On Fri, Dec 21, 2018 at 08:40:42AM +0100, Greg KH wrote:
>On Fri, Dec 21, 2018 at 07:57:21AM +1100, Dave Chinner wrote:
>> On Thu, Dec 20, 2018 at 01:28:25PM +0100, Greg KH wrote:
>> > On Thu, Dec 20, 2018 at 11:23:24PM +1100, Dave Chinner wrote:
>> > > From: Dave Chinner <dchinner@redhat.com>
>> > >
>> > > This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.
>>
>> <snip>
>>
>> > <formletter>
>> >
>> > This is not the correct way to submit patches for inclusion in the
>> > stable kernel tree.  Please read:
>> >     https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>> > for how to do this properly.
>> >
>> > </formletter>
>>
>> This wasn't a stable kernel patch submission. This was to let the
>> stable kernel maintainers know that a commit that was submitted for
>> stable kernel inclusion (via a cc stable in the commit message) is
>> buggy and should not be backported. i.e. if you've already queued/
>> backported 61c6de6672 then you need to drop it from your trees.
>
>As the commit 61c6de667263 ("fs/iomap.c: get/put the page in
>iomap_page_create/release()") was tagged for a stable release (and is in
>4.19.11) it would be nice to also add the cc: stable in the revert.
>When you just blindly send it to the stable mailing list, we have no
>idea what to do with it, hence the automated form letter that was sent.
>
>Now that this is in Linus's tree, I'll dig out the revert and remember
>to queue it up for the next round of stable updates, thanks for letting
>us know.

I've queued the revert for 4.19.

--
Thanks,
Sasha

Patch
diff mbox series

diff --git a/fs/iomap.c b/fs/iomap.c
index 5bc172f3dfe8..d6bc98ae8d35 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -116,12 +116,6 @@  iomap_page_create(struct inode *inode, struct page *page)
 	atomic_set(&iop->read_count, 0);
 	atomic_set(&iop->write_count, 0);
 	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
-
-	/*
-	 * migrate_page_move_mapping() assumes that pages with private data have
-	 * their count elevated by 1.
-	 */
-	get_page(page);
 	set_page_private(page, (unsigned long)iop);
 	SetPagePrivate(page);
 	return iop;
@@ -138,7 +132,6 @@  iomap_page_release(struct page *page)
 	WARN_ON_ONCE(atomic_read(&iop->write_count));
 	ClearPagePrivate(page);
 	set_page_private(page, 0);
-	put_page(page);
 	kfree(iop);
 }