[2/2,V2] xfs: toggle readonly state around xfs_log_mount_finish
diff mbox

Message ID 74f6a009-211c-617a-2d6f-0a115ceb366b@sandeen.net
State Accepted
Headers show

Commit Message

Eric Sandeen March 14, 2017, 11:23 p.m. UTC
When we do log recovery on a readonly mount, unlinked inode
processing does not happen due to the readonly checks in
xfs_inactive(), which are trying to prevent any I/O on a
readonly mount.

This is misguided - we do I/O on readonly mounts all the time,
for consistency; for example, log recovery.  So do the same
RDONLY flag twiddling around xfs_log_mount_finish() as we
do around xfs_log_mount(), for the same reason.

This all cries out for a big rework but for now this is a
simple fix to an obvious problem.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

V2: And now for something completely different...


--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Brian Foster March 15, 2017, 11:36 a.m. UTC | #1
On Tue, Mar 14, 2017 at 06:23:57PM -0500, Eric Sandeen wrote:
> When we do log recovery on a readonly mount, unlinked inode
> processing does not happen due to the readonly checks in
> xfs_inactive(), which are trying to prevent any I/O on a
> readonly mount.
> 
> This is misguided - we do I/O on readonly mounts all the time,
> for consistency; for example, log recovery.  So do the same
> RDONLY flag twiddling around xfs_log_mount_finish() as we
> do around xfs_log_mount(), for the same reason.
> 
> This all cries out for a big rework but for now this is a
> simple fix to an obvious problem.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 

Reviewed-by: Brian Foster <bfoster@redhat.com>

> V2: And now for something completely different...
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 62176b8..374edc1 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -743,16 +743,23 @@
>  	struct xfs_mount	*mp)
>  {
>  	int	error = 0;
> +	int	readonly = (mp->m_flags & XFS_MOUNT_RDONLY);
>  
>  	if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
>  		ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
>  		return 0;
> +	} else if (readonly) {
> +		/* Allow unlinked processing to proceed */
> +		mp->m_flags &= ~XFS_MOUNT_RDONLY;
>  	}
>  
>  	error = xlog_recover_finish(mp->m_log);
>  	if (!error)
>  		xfs_log_work_queue(mp);
>  
> +	if (readonly)
> +		mp->m_flags |= XFS_MOUNT_RDONLY;
> +
>  	return error;
>  }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong March 16, 2017, 7:15 p.m. UTC | #2
On Wed, Mar 15, 2017 at 07:36:29AM -0400, Brian Foster wrote:
> On Tue, Mar 14, 2017 at 06:23:57PM -0500, Eric Sandeen wrote:
> > When we do log recovery on a readonly mount, unlinked inode
> > processing does not happen due to the readonly checks in
> > xfs_inactive(), which are trying to prevent any I/O on a
> > readonly mount.
> > 
> > This is misguided - we do I/O on readonly mounts all the time,
> > for consistency; for example, log recovery.  So do the same
> > RDONLY flag twiddling around xfs_log_mount_finish() as we
> > do around xfs_log_mount(), for the same reason.
> > 
> > This all cries out for a big rework but for now this is a
> > simple fix to an obvious problem.
> > 
> > Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > ---
> > 

Both patches look ok, so I'll put them on the test queue for -rc4.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> > V2: And now for something completely different...
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index 62176b8..374edc1 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -743,16 +743,23 @@
> >  	struct xfs_mount	*mp)
> >  {
> >  	int	error = 0;
> > +	int	readonly = (mp->m_flags & XFS_MOUNT_RDONLY);
> >  
> >  	if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
> >  		ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
> >  		return 0;
> > +	} else if (readonly) {
> > +		/* Allow unlinked processing to proceed */
> > +		mp->m_flags &= ~XFS_MOUNT_RDONLY;
> >  	}
> >  
> >  	error = xlog_recover_finish(mp->m_log);
> >  	if (!error)
> >  		xfs_log_work_queue(mp);
> >  
> > +	if (readonly)
> > +		mp->m_flags |= XFS_MOUNT_RDONLY;
> > +
> >  	return error;
> >  }
> >  
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner March 16, 2017, 11:42 p.m. UTC | #3
On Thu, Mar 16, 2017 at 12:15:00PM -0700, Darrick J. Wong wrote:
> On Wed, Mar 15, 2017 at 07:36:29AM -0400, Brian Foster wrote:
> > On Tue, Mar 14, 2017 at 06:23:57PM -0500, Eric Sandeen wrote:
> > > When we do log recovery on a readonly mount, unlinked inode
> > > processing does not happen due to the readonly checks in
> > > xfs_inactive(), which are trying to prevent any I/O on a
> > > readonly mount.
> > > 
> > > This is misguided - we do I/O on readonly mounts all the time,
> > > for consistency; for example, log recovery.  So do the same
> > > RDONLY flag twiddling around xfs_log_mount_finish() as we
> > > do around xfs_log_mount(), for the same reason.
> > > 
> > > This all cries out for a big rework but for now this is a
> > > simple fix to an obvious problem.
> > > 
> > > Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > > ---
> > > 
> 
> Both patches look ok, so I'll put them on the test queue for -rc4.
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

FWIW, I don't think this is a -rc candidate. Making log recovery
process unlinked inode transactions on read-only mounts is a pretty
major change in behaviour. Who knows exactly what dragons are
lurking at lower layers that have never been run in this context
until now.

Also, it's not urgent - we've lived with this behaviour for years -
so waiting a month for the next merge window is not going to hurt
anyone and it gives us a chance to test it - XFS developers are the
people who should be burnt by the lurking dragons, not users who
updated to a late -rcX kernel....

Cheers,

Dave.
Eric Sandeen March 16, 2017, 11:52 p.m. UTC | #4
On 3/16/17 4:42 PM, Dave Chinner wrote:
> On Thu, Mar 16, 2017 at 12:15:00PM -0700, Darrick J. Wong wrote:
>> On Wed, Mar 15, 2017 at 07:36:29AM -0400, Brian Foster wrote:
>>> On Tue, Mar 14, 2017 at 06:23:57PM -0500, Eric Sandeen wrote:
>>>> When we do log recovery on a readonly mount, unlinked inode
>>>> processing does not happen due to the readonly checks in
>>>> xfs_inactive(), which are trying to prevent any I/O on a
>>>> readonly mount.
>>>>
>>>> This is misguided - we do I/O on readonly mounts all the time,
>>>> for consistency; for example, log recovery.  So do the same
>>>> RDONLY flag twiddling around xfs_log_mount_finish() as we
>>>> do around xfs_log_mount(), for the same reason.
>>>>
>>>> This all cries out for a big rework but for now this is a
>>>> simple fix to an obvious problem.
>>>>
>>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>>>> ---
>>>>
>>
>> Both patches look ok, so I'll put them on the test queue for -rc4.
>> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> FWIW, I don't think this is a -rc candidate. Making log recovery
> process unlinked inode transactions on read-only mounts is a pretty
> major change in behaviour. Who knows exactly what dragons are
> lurking at lower layers that have never been run in this context
> until now.
> 
> Also, it's not urgent - we've lived with this behaviour for years -
> so waiting a month for the next merge window is not going to hurt
> anyone and it gives us a chance to test it - XFS developers are the
> people who should be burnt by the lurking dragons, not users who
> updated to a late -rcX kernel....

To shield Darrick a bit ;) I was agitating/asking for sooner, but
admittedly that was a little bit selfish on my part.

Still, we have had field reports of people with /gigabytes/ missing
from the root filesystem, and it was not fixable without an 
xfs_repair.  Which on a root filesystem is ... special.

So, my fault for getting it sent late, for sure - but I do think it's
an important fix.  I know we can't really address the "unknown unknown"
dragons easily, but actually completing recovery on RO mounts seems
straightforward to me... we allow half of recovery to go, and
disallow the other half.  Seems plainly broken.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner March 18, 2017, 7:38 a.m. UTC | #5
On Thu, Mar 16, 2017 at 04:52:43PM -0700, Eric Sandeen wrote:
> On 3/16/17 4:42 PM, Dave Chinner wrote:
> > On Thu, Mar 16, 2017 at 12:15:00PM -0700, Darrick J. Wong wrote:
> >> On Wed, Mar 15, 2017 at 07:36:29AM -0400, Brian Foster wrote:
> >>> On Tue, Mar 14, 2017 at 06:23:57PM -0500, Eric Sandeen wrote:
> >>>> When we do log recovery on a readonly mount, unlinked inode
> >>>> processing does not happen due to the readonly checks in
> >>>> xfs_inactive(), which are trying to prevent any I/O on a
> >>>> readonly mount.
> >>>>
> >>>> This is misguided - we do I/O on readonly mounts all the time,
> >>>> for consistency; for example, log recovery.  So do the same
> >>>> RDONLY flag twiddling around xfs_log_mount_finish() as we
> >>>> do around xfs_log_mount(), for the same reason.
> >>>>
> >>>> This all cries out for a big rework but for now this is a
> >>>> simple fix to an obvious problem.
> >>>>
> >>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> >>>> ---
> >>>>
> >>
> >> Both patches look ok, so I'll put them on the test queue for -rc4.
> >> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > FWIW, I don't think this is a -rc candidate. Making log recovery
> > process unlinked inode transactions on read-only mounts is a pretty
> > major change in behaviour. Who knows exactly what dragons are
> > lurking at lower layers that have never been run in this context
> > until now.
> > 
> > Also, it's not urgent - we've lived with this behaviour for years -
> > so waiting a month for the next merge window is not going to hurt
> > anyone and it gives us a chance to test it - XFS developers are the
> > people who should be burnt by the lurking dragons, not users who
> > updated to a late -rcX kernel....
> 
> To shield Darrick a bit ;) I was agitating/asking for sooner, but
> admittedly that was a little bit selfish on my part.
> 
> Still, we have had field reports of people with /gigabytes/ missing
> from the root filesystem, and it was not fixable without an 
> xfs_repair.  Which on a root filesystem is ... special.

That's information that should be in the commit message....

> So, my fault for getting it sent late, for sure - but I do think it's
> an important fix.  I know we can't really address the "unknown unknown"
> dragons easily, but actually completing recovery on RO mounts seems
> straightforward to me... we allow half of recovery to go, and
> disallow the other half.  Seems plainly broken.

I still don't think that makes it an urgent, immediate -rcX fix.  It
definitely makes it a fix that should go to stable kernels, but that
does not mean we should short-cut our integrationa nd testing
processes. If anything, it makes it far more important to ensure the
change is safe and well tested, because it's going to be distributed
to /everyone/ in the near future through the stable update process,
distros included.

As I've already said: rushing fixes upstream without adequate test
time is almost always the wrong thing to do. Call me conservative,
but I have plenty of scars to justify being careful about pushing
fixes too quickly.

I'm more worried about the impact on the unknown number of read-only
filesystems out there across the entire userbase that have the
potential to process inodes that have been sitting orphaned for
years than I am about the few recent users who have had to run
xfs-repair on their root filesystem to fix this up due to the nature
of ro->rw transition in root filesystem mounting.  Let's make really
sure everything is OK before we expose it to all our users running
stable/distro kernels....

Cheers,

Dave.
Darrick J. Wong March 27, 2017, 5:16 p.m. UTC | #6
On Sat, Mar 18, 2017 at 06:38:35PM +1100, Dave Chinner wrote:
> On Thu, Mar 16, 2017 at 04:52:43PM -0700, Eric Sandeen wrote:
> > On 3/16/17 4:42 PM, Dave Chinner wrote:
> > > On Thu, Mar 16, 2017 at 12:15:00PM -0700, Darrick J. Wong wrote:
> > >> On Wed, Mar 15, 2017 at 07:36:29AM -0400, Brian Foster wrote:
> > >>> On Tue, Mar 14, 2017 at 06:23:57PM -0500, Eric Sandeen wrote:
> > >>>> When we do log recovery on a readonly mount, unlinked inode
> > >>>> processing does not happen due to the readonly checks in
> > >>>> xfs_inactive(), which are trying to prevent any I/O on a
> > >>>> readonly mount.
> > >>>>
> > >>>> This is misguided - we do I/O on readonly mounts all the time,
> > >>>> for consistency; for example, log recovery.  So do the same
> > >>>> RDONLY flag twiddling around xfs_log_mount_finish() as we
> > >>>> do around xfs_log_mount(), for the same reason.
> > >>>>
> > >>>> This all cries out for a big rework but for now this is a
> > >>>> simple fix to an obvious problem.
> > >>>>
> > >>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > >>>> ---
> > >>>>
> > >>
> > >> Both patches look ok, so I'll put them on the test queue for -rc4.
> > >> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > FWIW, I don't think this is a -rc candidate. Making log recovery
> > > process unlinked inode transactions on read-only mounts is a pretty
> > > major change in behaviour. Who knows exactly what dragons are
> > > lurking at lower layers that have never been run in this context
> > > until now.
> > > 
> > > Also, it's not urgent - we've lived with this behaviour for years -
> > > so waiting a month for the next merge window is not going to hurt
> > > anyone and it gives us a chance to test it - XFS developers are the
> > > people who should be burnt by the lurking dragons, not users who
> > > updated to a late -rcX kernel....
> > 
> > To shield Darrick a bit ;) I was agitating/asking for sooner, but
> > admittedly that was a little bit selfish on my part.
> > 
> > Still, we have had field reports of people with /gigabytes/ missing
> > from the root filesystem, and it was not fixable without an 
> > xfs_repair.  Which on a root filesystem is ... special.
> 
> That's information that should be in the commit message....
> 
> > So, my fault for getting it sent late, for sure - but I do think it's
> > an important fix.  I know we can't really address the "unknown unknown"
> > dragons easily, but actually completing recovery on RO mounts seems
> > straightforward to me... we allow half of recovery to go, and
> > disallow the other half.  Seems plainly broken.
> 
> I still don't think that makes it an urgent, immediate -rcX fix.  It
> definitely makes it a fix that should go to stable kernels, but that
> does not mean we should short-cut our integrationa nd testing
> processes. If anything, it makes it far more important to ensure the
> change is safe and well tested, because it's going to be distributed
> to /everyone/ in the near future through the stable update process,
> distros included.
> 
> As I've already said: rushing fixes upstream without adequate test
> time is almost always the wrong thing to do. Call me conservative,
> but I have plenty of scars to justify being careful about pushing
> fixes too quickly.
> 
> I'm more worried about the impact on the unknown number of read-only
> filesystems out there across the entire userbase that have the
> potential to process inodes that have been sitting orphaned for
> years than I am about the few recent users who have had to run
> xfs-repair on their root filesystem to fix this up due to the nature
> of ro->rw transition in root filesystem mounting.  Let's make really
> sure everything is OK before we expose it to all our users running
> stable/distro kernels....

FWIW I let this run w/ all my testing configs during LSF/Vault last week
and I didn't see any new failures.  I'll hold off on sending these patches.

But, waiting for 4.12 does provide the opportunity to add more stressful
tests than what generic/417 does now.  How about a test that creates a
big directory structure + some heavily fragmented files, then opens all
of those files, deletes the directory tree, shuts down the fs, then
attempts a ro mode recovery?  That way we have a lot of files and a lot
of bmap records to get rid of during mount.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 62176b8..374edc1 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -743,16 +743,23 @@ 
 	struct xfs_mount	*mp)
 {
 	int	error = 0;
+	int	readonly = (mp->m_flags & XFS_MOUNT_RDONLY);
 
 	if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
 		ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
 		return 0;
+	} else if (readonly) {
+		/* Allow unlinked processing to proceed */
+		mp->m_flags &= ~XFS_MOUNT_RDONLY;
 	}
 
 	error = xlog_recover_finish(mp->m_log);
 	if (!error)
 		xfs_log_work_queue(mp);
 
+	if (readonly)
+		mp->m_flags |= XFS_MOUNT_RDONLY;
+
 	return error;
 }