Message ID | 1481014847-22242-1-git-send-email-houtao1@huawei.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
On Tue, Dec 06, 2016 at 05:00:47PM +0800, Hou Tao wrote: > During the 2nd stage of log recovery, if the filesystem is firstly mounted > as read-only, the unlink inodes will not be destroyed and the unlinked list > in AGI will not be cleared. Even after a read-write remount or umount, > the unlinked inodes will still be valid and be kept on disk, and the > available freespace will be incorrect. > > To fix the problem, we need to force xfs_inactive() to destroy the > unlinked inode when the filesystem is mounted as read-only. > So clear the XFS_MOUNT_RDONLY flag temporarily before the recovery > of unlinked inodes and restore the flag after the recovery has done. > > The problem can be reproduced by the following steps: > 1. mount a xfs fs on a KVM VM > 2. on the VM launch an application which does the following things: > open(xfs_file); unlink(xfs_file); > while(1) { write(xfs_file, 2MB); sleep(1); } > 3. wait 5 seconds, sync the xfs fs, and wait 5 seconds > 4. terminate the VM > 5. start the VM and mount the xfs as read-only > 6. remount the xfs as read-write or umount > 7. check the unlinked list and the available freespace This is only papering over the larger problem. I was talking to Eric about this larger "recovery on read-only mount" problem last week on IRC - I can't find it my logs right now, but IIRC I'd suggested that we should always run xfs_mountfs() in read-write mount if the underlying device can be written to, and then once that is complete do a rw->ro transition exactly as we do for a remount,ro operation. That way we remove all the special "write on read only mount" hacks we currently have throughout the code to enable log recovery to run on read-only mounts. Essentially is requires moving the device read only check from the log code to xfs_fs_fill_super() and handling the no-recovery flag there before we call xfs_mountfs() and adding the rw->ro state transition after we return. This will be much simpler and much more reliable than trying to turn off "read only" state around certain operations... Cheers, Dave.
On 12/7/16 12:30 AM, Dave Chinner wrote: > On Tue, Dec 06, 2016 at 05:00:47PM +0800, Hou Tao wrote: >> During the 2nd stage of log recovery, if the filesystem is firstly mounted >> as read-only, the unlink inodes will not be destroyed and the unlinked list >> in AGI will not be cleared. Even after a read-write remount or umount, >> the unlinked inodes will still be valid and be kept on disk, and the >> available freespace will be incorrect. >> >> To fix the problem, we need to force xfs_inactive() to destroy the >> unlinked inode when the filesystem is mounted as read-only. >> So clear the XFS_MOUNT_RDONLY flag temporarily before the recovery >> of unlinked inodes and restore the flag after the recovery has done. >> >> The problem can be reproduced by the following steps: >> 1. mount a xfs fs on a KVM VM >> 2. on the VM launch an application which does the following things: >> open(xfs_file); unlink(xfs_file); >> while(1) { write(xfs_file, 2MB); sleep(1); } >> 3. wait 5 seconds, sync the xfs fs, and wait 5 seconds >> 4. terminate the VM >> 5. start the VM and mount the xfs as read-only >> 6. remount the xfs as read-write or umount >> 7. check the unlinked list and the available freespace > > This is only papering over the larger problem. > > I was talking to Eric about this larger "recovery on read-only > mount" problem last week on IRC - I can't find it my logs right now, > but IIRC I'd suggested that we should always run xfs_mountfs() > in read-write mount if the underlying device can be written to, > and then once that is complete do a rw->ro transition exactly as we > do for a remount,ro operation. Yeah, I have a larger patchset to try to handle this and other related processing that wasn't happening on ro mounts. I got derailed because my regression test for it ran into all kinds of unexpected new & unrelated bugs. So I haven't sent it yet... There were lots of little bits here and there stemming, I think, from old Irix code that didn't do /any/ device IO on a ro mount. -Eric > That way we remove all the special "write on read only mount" hacks > we currently have throughout the code to enable log recovery to run > on read-only mounts. > > Essentially is requires moving the device read only check from the > log code to xfs_fs_fill_super() and handling the no-recovery flag > there before we call xfs_mountfs() and adding the rw->ro state > transition after we return. This will be much simpler and much more > reliable than trying to turn off "read only" state around certain > operations... > > Cheers, > > Dave. > -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> I was talking to Eric about this larger "recovery on read-only >> mount" problem last week on IRC - I can't find it my logs right now, >> but IIRC I'd suggested that we should always run xfs_mountfs() >> in read-write mount if the underlying device can be written to, >> and then once that is complete do a rw->ro transition exactly as we >> do for a remount,ro operation. > > Yeah, I have a larger patchset to try to handle this and other > related processing that wasn't happening on ro mounts. I got > derailed because my regression test for it ran into all kinds > of unexpected new & unrelated bugs. So I haven't sent it yet... > > There were lots of little bits here and there stemming, I think, > from old Irix code that didn't do /any/ device IO on a ro mount. > > -Eric It's glad to hear that you have nearly completed the ro mount patchset, so I will stop reworking on this problem. During the rework, I found two puzzling XFS_MOUNT_RDONLY checkings at xfs_release(...) and xfs_inactive(...). In my opinion these check should be done at VFS instead of the specific filesystem, so why these XFS_MOUNT_RDONLY checkings there are ? Could we move the needed check to VFS just like the things sb_prepare_remount_readonly() have done ? Tao -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/15/16 5:36 AM, Hou Tao wrote: >>> I was talking to Eric about this larger "recovery on read-only >>> mount" problem last week on IRC - I can't find it my logs right now, >>> but IIRC I'd suggested that we should always run xfs_mountfs() >>> in read-write mount if the underlying device can be written to, >>> and then once that is complete do a rw->ro transition exactly as we >>> do for a remount,ro operation. >> >> Yeah, I have a larger patchset to try to handle this and other >> related processing that wasn't happening on ro mounts. I got >> derailed because my regression test for it ran into all kinds >> of unexpected new & unrelated bugs. So I haven't sent it yet... >> >> There were lots of little bits here and there stemming, I think, >> from old Irix code that didn't do /any/ device IO on a ro mount. >> >> -Eric > It's glad to hear that you have nearly completed the ro mount patchset, > so I will stop reworking on this problem. > > During the rework, I found two puzzling XFS_MOUNT_RDONLY checkings > at xfs_release(...) and xfs_inactive(...). In my opinion these check > should be done at VFS instead of the specific filesystem, so why > these XFS_MOUNT_RDONLY checkings there are ? Could we move the > needed check to VFS just like the things sb_prepare_remount_readonly() > have done ? Getting back to this - the reason for your original patch, I think, was that things like log recovery can get into this code without going through the vfs at all, so I think vfs checks are not always sufficient. That said, there probably are some XFS_MOUNT_RDONLY checks which are redundant. -Eric > Tao > -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 9b3d7c7..fcc83e0 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -5025,6 +5025,7 @@ xlog_recover_process_iunlinks( int bucket; int error; uint mp_dmevmask; + int ro_mount; mp = log->l_mp; @@ -5034,6 +5035,11 @@ xlog_recover_process_iunlinks( mp_dmevmask = mp->m_dmevmask; mp->m_dmevmask = 0; + /* Destroy the unlinked inodes even for read-only mount */ + ro_mount = mp->m_flags & XFS_MOUNT_RDONLY; + if (ro_mount) + mp->m_flags &= ~XFS_MOUNT_RDONLY; + for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) { /* * Find the agi for this ag. @@ -5070,6 +5076,9 @@ xlog_recover_process_iunlinks( xfs_buf_rele(agibp); } + if (ro_mount) + mp->m_flags |= XFS_MOUNT_RDONLY; + mp->m_dmevmask = mp_dmevmask; }
During the 2nd stage of log recovery, if the filesystem is firstly mounted as read-only, the unlink inodes will not be destroyed and the unlinked list in AGI will not be cleared. Even after a read-write remount or umount, the unlinked inodes will still be valid and be kept on disk, and the available freespace will be incorrect. To fix the problem, we need to force xfs_inactive() to destroy the unlinked inode when the filesystem is mounted as read-only. So clear the XFS_MOUNT_RDONLY flag temporarily before the recovery of unlinked inodes and restore the flag after the recovery has done. The problem can be reproduced by the following steps: 1. mount a xfs fs on a KVM VM 2. on the VM launch an application which does the following things: open(xfs_file); unlink(xfs_file); while(1) { write(xfs_file, 2MB); sleep(1); } 3. wait 5 seconds, sync the xfs fs, and wait 5 seconds 4. terminate the VM 5. start the VM and mount the xfs as read-only 6. remount the xfs as read-write or umount 7. check the unlinked list and the available freespace Cc: <stable@vger.kernel.org> [3.10+] Signed-off-by: Hou Tao <houtao1@huawei.com> --- fs/xfs/xfs_log_recover.c | 9 +++++++++ 1 file changed, 9 insertions(+)