diff mbox

xfs: during log recovery, destroy the unlinked inodes even for read-only mount

Message ID 1481014847-22242-1-git-send-email-houtao1@huawei.com
State New, archived
Headers show

Commit Message

Hou Tao Dec. 6, 2016, 9 a.m. UTC
During the 2nd stage of log recovery, if the filesystem is firstly mounted
as read-only, the unlink inodes will not be destroyed and the unlinked list
in AGI will not be cleared. Even after a read-write remount or umount,
the unlinked inodes will still be valid and be kept on disk, and the
available freespace will be incorrect.

To fix the problem, we need to force xfs_inactive() to destroy the
unlinked inode when the filesystem is mounted as read-only.
So clear the XFS_MOUNT_RDONLY flag temporarily before the recovery
of unlinked inodes and restore the flag after the recovery has done.

The problem can be reproduced by the following steps:
1. mount a xfs fs on a KVM VM
2. on the VM launch an application which does the following things:
   open(xfs_file); unlink(xfs_file);
   while(1) { write(xfs_file, 2MB); sleep(1); }
3. wait 5 seconds, sync the xfs fs, and wait 5 seconds
4. terminate the VM
5. start the VM and mount the xfs as read-only
6. remount the xfs as read-write or umount
7. check the unlinked list and the available freespace

Cc: <stable@vger.kernel.org>        [3.10+]
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/xfs/xfs_log_recover.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Dave Chinner Dec. 7, 2016, 6:30 a.m. UTC | #1
On Tue, Dec 06, 2016 at 05:00:47PM +0800, Hou Tao wrote:
> During the 2nd stage of log recovery, if the filesystem is firstly mounted
> as read-only, the unlink inodes will not be destroyed and the unlinked list
> in AGI will not be cleared. Even after a read-write remount or umount,
> the unlinked inodes will still be valid and be kept on disk, and the
> available freespace will be incorrect.
> 
> To fix the problem, we need to force xfs_inactive() to destroy the
> unlinked inode when the filesystem is mounted as read-only.
> So clear the XFS_MOUNT_RDONLY flag temporarily before the recovery
> of unlinked inodes and restore the flag after the recovery has done.
> 
> The problem can be reproduced by the following steps:
> 1. mount a xfs fs on a KVM VM
> 2. on the VM launch an application which does the following things:
>    open(xfs_file); unlink(xfs_file);
>    while(1) { write(xfs_file, 2MB); sleep(1); }
> 3. wait 5 seconds, sync the xfs fs, and wait 5 seconds
> 4. terminate the VM
> 5. start the VM and mount the xfs as read-only
> 6. remount the xfs as read-write or umount
> 7. check the unlinked list and the available freespace

This is only papering over the larger problem.

I was talking to Eric about this larger "recovery on read-only
mount" problem last week on IRC - I can't find it my logs right now,
but IIRC I'd suggested that we should always run xfs_mountfs()
in read-write mount if the underlying device can be written to,
and then once that is complete do a rw->ro transition exactly as we
do for a remount,ro operation.

That way we remove all the special "write on read only mount" hacks
we currently have throughout the code to enable log recovery to run
on read-only mounts.

Essentially is requires moving the device read only check from the
log code to xfs_fs_fill_super() and handling the no-recovery flag
there before we call xfs_mountfs() and adding the rw->ro state
transition after we return. This will be much simpler and much more
reliable than trying to turn off "read only" state around certain
operations...

Cheers,

Dave.
Eric Sandeen Dec. 15, 2016, 4:07 a.m. UTC | #2
On 12/7/16 12:30 AM, Dave Chinner wrote:
> On Tue, Dec 06, 2016 at 05:00:47PM +0800, Hou Tao wrote:
>> During the 2nd stage of log recovery, if the filesystem is firstly mounted
>> as read-only, the unlink inodes will not be destroyed and the unlinked list
>> in AGI will not be cleared. Even after a read-write remount or umount,
>> the unlinked inodes will still be valid and be kept on disk, and the
>> available freespace will be incorrect.
>>
>> To fix the problem, we need to force xfs_inactive() to destroy the
>> unlinked inode when the filesystem is mounted as read-only.
>> So clear the XFS_MOUNT_RDONLY flag temporarily before the recovery
>> of unlinked inodes and restore the flag after the recovery has done.
>>
>> The problem can be reproduced by the following steps:
>> 1. mount a xfs fs on a KVM VM
>> 2. on the VM launch an application which does the following things:
>>    open(xfs_file); unlink(xfs_file);
>>    while(1) { write(xfs_file, 2MB); sleep(1); }
>> 3. wait 5 seconds, sync the xfs fs, and wait 5 seconds
>> 4. terminate the VM
>> 5. start the VM and mount the xfs as read-only
>> 6. remount the xfs as read-write or umount
>> 7. check the unlinked list and the available freespace
> 
> This is only papering over the larger problem.
> 
> I was talking to Eric about this larger "recovery on read-only
> mount" problem last week on IRC - I can't find it my logs right now,
> but IIRC I'd suggested that we should always run xfs_mountfs()
> in read-write mount if the underlying device can be written to,
> and then once that is complete do a rw->ro transition exactly as we
> do for a remount,ro operation.

Yeah, I have a larger patchset to try to handle this and other
related processing that wasn't happening on ro mounts.  I got
derailed because my regression test for it ran into all kinds
of unexpected new & unrelated bugs.  So I haven't sent it yet...

There were lots of little bits here and there stemming, I think,
from old Irix code that didn't do /any/ device IO on a ro mount.

-Eric

> That way we remove all the special "write on read only mount" hacks
> we currently have throughout the code to enable log recovery to run
> on read-only mounts.
> 
> Essentially is requires moving the device read only check from the
> log code to xfs_fs_fill_super() and handling the no-recovery flag
> there before we call xfs_mountfs() and adding the rw->ro state
> transition after we return. This will be much simpler and much more
> reliable than trying to turn off "read only" state around certain
> operations...
> 
> Cheers,
> 
> Dave.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hou Tao Dec. 15, 2016, 11:36 a.m. UTC | #3
>> I was talking to Eric about this larger "recovery on read-only
>> mount" problem last week on IRC - I can't find it my logs right now,
>> but IIRC I'd suggested that we should always run xfs_mountfs()
>> in read-write mount if the underlying device can be written to,
>> and then once that is complete do a rw->ro transition exactly as we
>> do for a remount,ro operation.
> 
> Yeah, I have a larger patchset to try to handle this and other
> related processing that wasn't happening on ro mounts.  I got
> derailed because my regression test for it ran into all kinds
> of unexpected new & unrelated bugs.  So I haven't sent it yet...
> 
> There were lots of little bits here and there stemming, I think,
> from old Irix code that didn't do /any/ device IO on a ro mount.
> 
> -Eric
It's glad to hear that you have nearly completed the ro mount patchset,
so I will stop reworking on this problem.

During the rework, I found two puzzling XFS_MOUNT_RDONLY checkings
at xfs_release(...) and xfs_inactive(...). In my opinion these check
should be done at VFS instead of the specific filesystem, so why
these XFS_MOUNT_RDONLY checkings there are ? Could we move the
needed check to VFS just like the things sb_prepare_remount_readonly()
have done ?

Tao


--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Feb. 8, 2017, 2:31 a.m. UTC | #4
On 12/15/16 5:36 AM, Hou Tao wrote:
>>> I was talking to Eric about this larger "recovery on read-only
>>> mount" problem last week on IRC - I can't find it my logs right now,
>>> but IIRC I'd suggested that we should always run xfs_mountfs()
>>> in read-write mount if the underlying device can be written to,
>>> and then once that is complete do a rw->ro transition exactly as we
>>> do for a remount,ro operation.
>>
>> Yeah, I have a larger patchset to try to handle this and other
>> related processing that wasn't happening on ro mounts.  I got
>> derailed because my regression test for it ran into all kinds
>> of unexpected new & unrelated bugs.  So I haven't sent it yet...
>>
>> There were lots of little bits here and there stemming, I think,
>> from old Irix code that didn't do /any/ device IO on a ro mount.
>>
>> -Eric
> It's glad to hear that you have nearly completed the ro mount patchset,
> so I will stop reworking on this problem.
> 
> During the rework, I found two puzzling XFS_MOUNT_RDONLY checkings
> at xfs_release(...) and xfs_inactive(...). In my opinion these check
> should be done at VFS instead of the specific filesystem, so why
> these XFS_MOUNT_RDONLY checkings there are ? Could we move the
> needed check to VFS just like the things sb_prepare_remount_readonly()
> have done ?

Getting back to this - the reason for your original patch, I think,
was that things like log recovery can get into this code without going
through the vfs at all, so I think vfs checks are not always sufficient.

That said, there probably are some XFS_MOUNT_RDONLY checks which are
redundant.

-Eric

> Tao
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 9b3d7c7..fcc83e0 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -5025,6 +5025,7 @@  xlog_recover_process_iunlinks(
 	int		bucket;
 	int		error;
 	uint		mp_dmevmask;
+	int ro_mount;
 
 	mp = log->l_mp;
 
@@ -5034,6 +5035,11 @@  xlog_recover_process_iunlinks(
 	mp_dmevmask = mp->m_dmevmask;
 	mp->m_dmevmask = 0;
 
+	/* Destroy the unlinked inodes even for read-only mount */
+	ro_mount = mp->m_flags & XFS_MOUNT_RDONLY;
+	if (ro_mount)
+		mp->m_flags &= ~XFS_MOUNT_RDONLY;
+
 	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
 		/*
 		 * Find the agi for this ag.
@@ -5070,6 +5076,9 @@  xlog_recover_process_iunlinks(
 		xfs_buf_rele(agibp);
 	}
 
+	if (ro_mount)
+		mp->m_flags |= XFS_MOUNT_RDONLY;
+
 	mp->m_dmevmask = mp_dmevmask;
 }