[2/3] xfs: don't leak linked inodes during log recovery
diff mbox

Message ID 20170809010650.GF4474@magnolia
State New
Headers show

Commit Message

Darrick J. Wong Aug. 9, 2017, 1:06 a.m. UTC
When we introduced the bmap redo log items, we set MS_ACTIVE on the
mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
from being truncated prematurely during log recovery.  However, we
neglected to drop linked inodes that are recovered, and if we don't use
the inode between recovery and unmount, the inode will never be marked
reclaimable and thus we fail to free it at umount time.  If we're in
log recovery but IRECOVERY is /not/ set, the inode is linked and can be
reclaimed.

Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_super.c |    7 +++++++
 1 file changed, 7 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Brian Foster Aug. 9, 2017, 12:36 p.m. UTC | #1
On Tue, Aug 08, 2017 at 06:06:50PM -0700, Darrick J. Wong wrote:
> When we introduced the bmap redo log items, we set MS_ACTIVE on the
> mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
> from being truncated prematurely during log recovery.  However, we
> neglected to drop linked inodes that are recovered, and if we don't use
> the inode between recovery and unmount, the inode will never be marked
> reclaimable and thus we fail to free it at umount time.  If we're in
> log recovery but IRECOVERY is /not/ set, the inode is linked and can be
> reclaimed.
> 

I follow the change in behavior in the previous commit and how this
restores the original behavior for linked inodes, so this patch makes
sense from that perspective. I'm not following where/how the leak occurs
from the description, however. Wouldn't the inode end up on the lru to
be shrunk/evicted/reclaimed at a later point? What happens if the inode
is subsequently used that prevents the leak? (Whatever I'm missing, it
would be nice to elaborate on in the commit log.)

Also, if there is a memory leak vector for !drop linked inodes here,
does that not apply to XFS_IRECOVERY inodes if log recovery itself
happens to fail between bui recovery and iunlink processing?

Brian

> Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_super.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 38aaacd..9b06ca2 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1040,6 +1040,13 @@ xfs_fs_drop_inode(
>  	if (ip->i_flags & XFS_IRECOVERY) {
>  		ASSERT(ip->i_mount->m_log->l_flags & XLOG_RECOVERY_NEEDED);
>  		return 0;
> +	} else if (ip->i_mount->m_log->l_flags & XLOG_RECOVERY_NEEDED) {
> +		/*
> +		 * This inode was loaded during recovery but is not
> +		 * being unlinked, so we can free it without fear of
> +		 * premature truncation.
> +		 */
> +		return 1;
>  	}
>  
>  	return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong Aug. 9, 2017, 4:49 p.m. UTC | #2
On Wed, Aug 09, 2017 at 08:36:49AM -0400, Brian Foster wrote:
> On Tue, Aug 08, 2017 at 06:06:50PM -0700, Darrick J. Wong wrote:
> > When we introduced the bmap redo log items, we set MS_ACTIVE on the
> > mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
> > from being truncated prematurely during log recovery.  However, we
> > neglected to drop linked inodes that are recovered, and if we don't use
> > the inode between recovery and unmount, the inode will never be marked
> > reclaimable and thus we fail to free it at umount time.  If we're in
> > log recovery but IRECOVERY is /not/ set, the inode is linked and can be
> > reclaimed.
> > 
> 
> I follow the change in behavior in the previous commit and how this
> restores the original behavior for linked inodes, so this patch makes
> sense from that perspective. I'm not following where/how the leak occurs
> from the description, however.

Linked inodes are inode_add_lru()'d, but nothing ever calls
evict_inodes() to clean up that lru (sb->s_inodes).

> Wouldn't the inode end up on the lru to be shrunk/evicted/reclaimed at
> a later point?

> What happens if the inode is subsequently used that prevents the leak?
> (Whatever I'm missing, it would be nice to elaborate on in the commit
> log.)

If we make it all the way to a successful mount, then an unmount can
call generic_shutdown_super -> evict_inodes to clean up all the inodes
on the lru list.

> Also, if there is a memory leak vector for !drop linked inodes here,
> does that not apply to XFS_IRECOVERY inodes if log recovery itself
> happens to fail between bui recovery and iunlink processing?

Ugh, I forgot about that possibility.  I think the solution is to
evict_inodes right after we clear MS_ACTIVE but before we see if
xfs_log_mount_finish actually failed.

--D

> 
> Brian
> 
> > Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_super.c |    7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 38aaacd..9b06ca2 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1040,6 +1040,13 @@ xfs_fs_drop_inode(
> >  	if (ip->i_flags & XFS_IRECOVERY) {
> >  		ASSERT(ip->i_mount->m_log->l_flags & XLOG_RECOVERY_NEEDED);
> >  		return 0;
> > +	} else if (ip->i_mount->m_log->l_flags & XLOG_RECOVERY_NEEDED) {
> > +		/*
> > +		 * This inode was loaded during recovery but is not
> > +		 * being unlinked, so we can free it without fear of
> > +		 * premature truncation.
> > +		 */
> > +		return 1;
> >  	}
> >  
> >  	return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster Aug. 9, 2017, 5:17 p.m. UTC | #3
On Wed, Aug 09, 2017 at 09:49:29AM -0700, Darrick J. Wong wrote:
> On Wed, Aug 09, 2017 at 08:36:49AM -0400, Brian Foster wrote:
> > On Tue, Aug 08, 2017 at 06:06:50PM -0700, Darrick J. Wong wrote:
> > > When we introduced the bmap redo log items, we set MS_ACTIVE on the
> > > mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
> > > from being truncated prematurely during log recovery.  However, we
> > > neglected to drop linked inodes that are recovered, and if we don't use
> > > the inode between recovery and unmount, the inode will never be marked
> > > reclaimable and thus we fail to free it at umount time.  If we're in
> > > log recovery but IRECOVERY is /not/ set, the inode is linked and can be
> > > reclaimed.
> > > 
> > 
> > I follow the change in behavior in the previous commit and how this
> > restores the original behavior for linked inodes, so this patch makes
> > sense from that perspective. I'm not following where/how the leak occurs
> > from the description, however.
> 
> Linked inodes are inode_add_lru()'d, but nothing ever calls
> evict_inodes() to clean up that lru (sb->s_inodes).
> 
> > Wouldn't the inode end up on the lru to be shrunk/evicted/reclaimed at
> > a later point?
> 
> > What happens if the inode is subsequently used that prevents the leak?
> > (Whatever I'm missing, it would be nice to elaborate on in the commit
> > log.)
> 
> If we make it all the way to a successful mount, then an unmount can
> call generic_shutdown_super -> evict_inodes to clean up all the inodes
> on the lru list.
> 

Ok, so you are saying that the leak occurs only if the mount ends up
failing because the vfs doesn't expect to have to evict lru inodes until
it sets the sb active, right? If so, that makes sense.. thanks.

FWIW, I think it's just more that the patch description is a bit
confusing here than anything. It doesn't mention mount failure and I'm
still a little confused about what you mean by: "if we don't use the
inode between recovery and unmount, the inode will never be marked
reclaimable and thus we fail to free it at umount time." Does that mean
that assuming imminent mount failure, the inode is leaked unless
something else happens to lookup/release the inode after we clear
MS_ACTIVE but before the mount failure sequence completes?

> > Also, if there is a memory leak vector for !drop linked inodes here,
> > does that not apply to XFS_IRECOVERY inodes if log recovery itself
> > happens to fail between bui recovery and iunlink processing?
> 
> Ugh, I forgot about that possibility.  I think the solution is to
> evict_inodes right after we clear MS_ACTIVE but before we see if
> xfs_log_mount_finish actually failed.
> 

That sounds reasonable. If we do that, I suppose that means we can drop
the additional logic in ->drop_inode() because that would cover both
linked and unlinked inodes that make it onto the lru during recovery.

Brian

> --D
> 
> > 
> > Brian
> > 
> > > Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/xfs_super.c |    7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index 38aaacd..9b06ca2 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -1040,6 +1040,13 @@ xfs_fs_drop_inode(
> > >  	if (ip->i_flags & XFS_IRECOVERY) {
> > >  		ASSERT(ip->i_mount->m_log->l_flags & XLOG_RECOVERY_NEEDED);
> > >  		return 0;
> > > +	} else if (ip->i_mount->m_log->l_flags & XLOG_RECOVERY_NEEDED) {
> > > +		/*
> > > +		 * This inode was loaded during recovery but is not
> > > +		 * being unlinked, so we can free it without fear of
> > > +		 * premature truncation.
> > > +		 */
> > > +		return 1;
> > >  	}
> > >  
> > >  	return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 38aaacd..9b06ca2 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1040,6 +1040,13 @@  xfs_fs_drop_inode(
 	if (ip->i_flags & XFS_IRECOVERY) {
 		ASSERT(ip->i_mount->m_log->l_flags & XLOG_RECOVERY_NEEDED);
 		return 0;
+	} else if (ip->i_mount->m_log->l_flags & XLOG_RECOVERY_NEEDED) {
+		/*
+		 * This inode was loaded during recovery but is not
+		 * being unlinked, so we can free it without fear of
+		 * premature truncation.
+		 */
+		return 1;
 	}
 
 	return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);