diff mbox series

xfs: pin inodes that would otherwise overflow link count

Message ID 20231011203350.GY21298@frogsfrogsfrogs (mailing list archive)
State Superseded
Headers show
Series xfs: pin inodes that would otherwise overflow link count | expand

Commit Message

Darrick J. Wong Oct. 11, 2023, 8:33 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

The VFS inc_nlink function does not explicitly check for integer
overflows in the i_nlink field.  Instead, it checks the link count
against s_max_links in the vfs_{link,create,rename} functions.  XFS
sets the maximum link count to 2.1 billion, so integer overflows should
not be a problem.

However.  It's possible that online repair could find that a file has
more than four billion links, particularly if the link count got
corrupted while creating hardlinks to the file.  The di_nlinkv2 field is
not large enough to store a value larger than 2^32, so we ought to
define a magic pin value of ~0U which means that the inode never gets
deleted.  This will prevent a UAF error if the repair finds this
situation and users begin deleting links to the file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_format.h |    6 ++++++
 fs/xfs/scrub/nlinks.c      |    8 ++++----
 fs/xfs/scrub/repair.c      |   12 ++++++------
 fs/xfs/xfs_inode.c         |   28 +++++++++++++++++++++++-----
 4 files changed, 39 insertions(+), 15 deletions(-)

Comments

Dave Chinner Oct. 11, 2023, 9:08 p.m. UTC | #1
On Wed, Oct 11, 2023 at 01:33:50PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> The VFS inc_nlink function does not explicitly check for integer
> overflows in the i_nlink field.  Instead, it checks the link count
> against s_max_links in the vfs_{link,create,rename} functions.  XFS
> sets the maximum link count to 2.1 billion, so integer overflows should
> not be a problem.
> 
> However.  It's possible that online repair could find that a file has
> more than four billion links, particularly if the link count got

I don't think we should be attempting to fix that online - if we've
really found an inode with 4 billion links then something else has
gone wrong during repair because we shouldn't get there in the first
place.

IOWs, we should be preventing a link count overflow at the time 
that the link count is being added and returning -EMLINK errors to
that operation. This prevents overflow, and so if repair does find
more than 2.1 billion links to the inode, there's clearly something
else very wrong (either in repair or a bug in the filesystem that
has leaked many, many link counts).

huh.

We set sb->s_max_links = XFS_MAXLINKS, but nowhere does the VFS
enforce that, nor does any XFS code. The lack of checking or
enforcement of filesystem max link count anywhere is ... not ideal.

Regardless, I don't think fixing nlink overflow cases should be done
online. A couple of billion links to a single inode takes a
*long* time to create and even longer to validate (and take a -lot-
of memory).  Hence I think we should just punt "more than 2.1
billion links" to the offline repair, because online repair can't do
anything to actually find whatever caused the overflow in the
first place, nor can it actually fix it - it should never have
happened in the first place....

> corrupted while creating hardlinks to the file.  The di_nlinkv2 field is
> not large enough to store a value larger than 2^32, so we ought to
> define a magic pin value of ~0U which means that the inode never gets
> deleted.  This will prevent a UAF error if the repair finds this
> situation and users begin deleting links to the file.

I think that's fine as a defence against implementation bugs, but I
don't think that it really makes any real difference to the "repair
might find an overflow" case...

> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_format.h |    6 ++++++
>  fs/xfs/scrub/nlinks.c      |    8 ++++----
>  fs/xfs/scrub/repair.c      |   12 ++++++------
>  fs/xfs/xfs_inode.c         |   28 +++++++++++++++++++++++-----
>  4 files changed, 39 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 6409dd22530f2..320522b887bb3 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -896,6 +896,12 @@ static inline uint xfs_dinode_size(int version)
>   */
>  #define	XFS_MAXLINK		((1U << 31) - 1U)
>  
> +/*
> + * Any file that hits the maximum ondisk link count should be pinned to avoid
> + * a use-after-free situation.
> + */
> +#define XFS_NLINK_PINNED	(~0U)
> +
>  /*
>   * Values for di_format
>   *
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 4db2c2a6538d6..30604e11182c4 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -910,15 +910,25 @@ xfs_init_new_inode(
>   */
>  static int			/* error */
>  xfs_droplink(
> -	xfs_trans_t *tp,
> -	xfs_inode_t *ip)
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*ip)
>  {
> +	struct inode		*inode = VFS_I(ip);
> +
>  	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
>  
> -	drop_nlink(VFS_I(ip));
> +	if (inode->i_nlink == 0) {
> +		xfs_info_ratelimited(tp->t_mountp,
> + "Inode 0x%llx link count dropped below zero.  Pinning link count.",
> +				ip->i_ino);
> +		set_nlink(inode, XFS_NLINK_PINNED);
> +	}
> +	if (inode->i_nlink != XFS_NLINK_PINNED)
> +		drop_nlink(inode);
> +
>  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);

I think the di_nlink field now needs to be checked by verifiers to
ensure the value is in the range of:

	(0 <= di_nlink <= XFS_MAXLINKS || di_nlink == XFS_NLINK_PINNED)

And we need to ensure that in xfs_bumplink() - or earlier (top avoid
dirty xaction cancle shutdowns) - that adding a count to di_nlink is
not going to exceed XFS_MAXLINKS....

Cheers,

Dave.
Dave Chinner Oct. 11, 2023, 9:14 p.m. UTC | #2
On Thu, Oct 12, 2023 at 08:08:20AM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2023 at 01:33:50PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > The VFS inc_nlink function does not explicitly check for integer
> > overflows in the i_nlink field.  Instead, it checks the link count
> > against s_max_links in the vfs_{link,create,rename} functions.  XFS
> > sets the maximum link count to 2.1 billion, so integer overflows should
> > not be a problem.
> > 
> > However.  It's possible that online repair could find that a file has
> > more than four billion links, particularly if the link count got
> 
> I don't think we should be attempting to fix that online - if we've
> really found an inode with 4 billion links then something else has
> gone wrong during repair because we shouldn't get there in the first
> place.
> 
> IOWs, we should be preventing a link count overflow at the time 
> that the link count is being added and returning -EMLINK errors to
> that operation. This prevents overflow, and so if repair does find
> more than 2.1 billion links to the inode, there's clearly something
> else very wrong (either in repair or a bug in the filesystem that
> has leaked many, many link counts).
> 
> huh.
> 
> We set sb->s_max_links = XFS_MAXLINKS, but nowhere does the VFS
> enforce that, nor does any XFS code. The lack of checking or
> enforcement of filesystem max link count anywhere is ... not ideal.

No, wait, I read the cscope output wrong. sb->s_max_links *is*
enforced at the VFS level, so we should never end up in a situation
with link count greater than XFS_MAXLINKS inside the XFS code in
normal operation. i.e. A count greater than that is an indication of
a software bug or corruption, so we should definitely be verifying
di_nlink is within the valid on-disk range regardless of anything
else....

-Dave.
Darrick J. Wong Oct. 11, 2023, 9:25 p.m. UTC | #3
On Thu, Oct 12, 2023 at 08:14:44AM +1100, Dave Chinner wrote:
> On Thu, Oct 12, 2023 at 08:08:20AM +1100, Dave Chinner wrote:
> > On Wed, Oct 11, 2023 at 01:33:50PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > The VFS inc_nlink function does not explicitly check for integer
> > > overflows in the i_nlink field.  Instead, it checks the link count
> > > against s_max_links in the vfs_{link,create,rename} functions.  XFS
> > > sets the maximum link count to 2.1 billion, so integer overflows should
> > > not be a problem.
> > > 
> > > However.  It's possible that online repair could find that a file has
> > > more than four billion links, particularly if the link count got
> > 
> > I don't think we should be attempting to fix that online - if we've
> > really found an inode with 4 billion links then something else has
> > gone wrong during repair because we shouldn't get there in the first
> > place.
> > 
> > IOWs, we should be preventing a link count overflow at the time 
> > that the link count is being added and returning -EMLINK errors to
> > that operation. This prevents overflow, and so if repair does find
> > more than 2.1 billion links to the inode, there's clearly something
> > else very wrong (either in repair or a bug in the filesystem that
> > has leaked many, many link counts).
> > 
> > huh.
> > 
> > We set sb->s_max_links = XFS_MAXLINKS, but nowhere does the VFS
> > enforce that, nor does any XFS code. The lack of checking or
> > enforcement of filesystem max link count anywhere is ... not ideal.
> 
> No, wait, I read the cscope output wrong. sb->s_max_links *is*
> enforced at the VFS level, so we should never end up in a situation
> with link count greater than XFS_MAXLINKS inside the XFS code in
> normal operation. i.e. A count greater than that is an indication of
> a software bug or corruption, so we should definitely be verifying
> di_nlink is within the valid on-disk range regardless of anything
> else....

... and I just realized that the VFS doesn't check for underflows when
unlinking or rmdir'ing.  Maybe it should be doing that instead of
patching XFS and everything else?

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Darrick J. Wong Oct. 11, 2023, 9:41 p.m. UTC | #4
On Thu, Oct 12, 2023 at 08:08:20AM +1100, Dave Chinner wrote:
> On Wed, Oct 11, 2023 at 01:33:50PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > The VFS inc_nlink function does not explicitly check for integer
> > overflows in the i_nlink field.  Instead, it checks the link count
> > against s_max_links in the vfs_{link,create,rename} functions.  XFS
> > sets the maximum link count to 2.1 billion, so integer overflows should
> > not be a problem.
> > 
> > However.  It's possible that online repair could find that a file has
> > more than four billion links, particularly if the link count got
> 
> I don't think we should be attempting to fix that online - if we've
> really found an inode with 4 billion links then something else has
> gone wrong during repair because we shouldn't get there in the first
> place.

I don't agree -- if online repair really does find 3 billion dirents
pointing to a (very) hardlinked file, then it should set the link count
to 3 billion.  The VFS will not let userspace add more hardlinks, but it
will let userspace remove hardlinks.

> IOWs, we should be preventing a link count overflow at the time 
> that the link count is being added and returning -EMLINK errors to
> that operation. This prevents overflow, and so if repair does find
> more than 2.1 billion links to the inode, there's clearly something
> else very wrong (either in repair or a bug in the filesystem that
> has leaked many, many link counts).
> 
> huh.
> 
> We set sb->s_max_links = XFS_MAXLINKS, but nowhere does the VFS
> enforce that, nor does any XFS code. The lack of checking or
> enforcement of filesystem max link count anywhere is ... not ideal.

[Which the VFS does, so I'll set aside the last 2 paragraphs.]

> Regardless, I don't think fixing nlink overflow cases should be done
> online. A couple of billion links to a single inode takes a
> *long* time to create and even longer to validate (and take a -lot-
> of memory).

xfs_repair will burn a lot of memory and time doing that; xfs_scrub will
take just as much time but only O(icount) memory.

> Hence I think we should just punt "more than 2.1
> billion links" to the offline repair, because online repair can't do
> anything to actually find whatever caused the overflow in the
> first place, nor can it actually fix it - it should never have
> happened in the first place....

I don't think deleting dirents to reduce link count is a good idea,
since the repair tool will have no idea which directory links are more
deletable than others.

If repair finds XFS_MAXLINKS < nr_dirents < -1U, then I think we should
reset the link count and let userspace decide if they're going to unlink
the file to reduce the link count.  That's already what xfs_repair does,
and xfs_scrub follows that behavior.

For nr_dirents > -1U, online repair just skips the file and reports that
repairs didn't succeed.  xfs_repair overflows the u32 and won't notice
that it's now set the link count to something suspiciously low.

For nr_dirents <= XFS_MAXLINKS, both repair tools reset the link count,
end of story.

(This patch comes at the end of online nlink repair, so a lot of this
nuance would be a lot easier to spot.  I was originally going to leave
this patch until we got to that part of online repair, but then this
other thing came in and I felt it important to get it on the list, even
at a price of incohesion.  Oh well.)

> > corrupted while creating hardlinks to the file.  The di_nlinkv2 field is
> > not large enough to store a value larger than 2^32, so we ought to
> > define a magic pin value of ~0U which means that the inode never gets
> > deleted.  This will prevent a UAF error if the repair finds this
> > situation and users begin deleting links to the file.
> 
> I think that's fine as a defence against implementation bugs, but I
> don't think that it really makes any real difference to the "repair
> might find an overflow" case...

Fair enough.

--D

> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_format.h |    6 ++++++
> >  fs/xfs/scrub/nlinks.c      |    8 ++++----
> >  fs/xfs/scrub/repair.c      |   12 ++++++------
> >  fs/xfs/xfs_inode.c         |   28 +++++++++++++++++++++++-----
> >  4 files changed, 39 insertions(+), 15 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > index 6409dd22530f2..320522b887bb3 100644
> > --- a/fs/xfs/libxfs/xfs_format.h
> > +++ b/fs/xfs/libxfs/xfs_format.h
> > @@ -896,6 +896,12 @@ static inline uint xfs_dinode_size(int version)
> >   */
> >  #define	XFS_MAXLINK		((1U << 31) - 1U)
> >  
> > +/*
> > + * Any file that hits the maximum ondisk link count should be pinned to avoid
> > + * a use-after-free situation.
> > + */
> > +#define XFS_NLINK_PINNED	(~0U)
> > +
> >  /*
> >   * Values for di_format
> >   *
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 4db2c2a6538d6..30604e11182c4 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -910,15 +910,25 @@ xfs_init_new_inode(
> >   */
> >  static int			/* error */
> >  xfs_droplink(
> > -	xfs_trans_t *tp,
> > -	xfs_inode_t *ip)
> > +	struct xfs_trans	*tp,
> > +	struct xfs_inode	*ip)
> >  {
> > +	struct inode		*inode = VFS_I(ip);
> > +
> >  	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> >  
> > -	drop_nlink(VFS_I(ip));
> > +	if (inode->i_nlink == 0) {
> > +		xfs_info_ratelimited(tp->t_mountp,
> > + "Inode 0x%llx link count dropped below zero.  Pinning link count.",
> > +				ip->i_ino);
> > +		set_nlink(inode, XFS_NLINK_PINNED);
> > +	}
> > +	if (inode->i_nlink != XFS_NLINK_PINNED)
> > +		drop_nlink(inode);
> > +
> >  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> 
> I think the di_nlink field now needs to be checked by verifiers to
> ensure the value is in the range of:
> 
> 	(0 <= di_nlink <= XFS_MAXLINKS || di_nlink == XFS_NLINK_PINNED)
> 
> And we need to ensure that in xfs_bumplink() - or earlier (top avoid
> dirty xaction cancle shutdowns) - that adding a count to di_nlink is
> not going to exceed XFS_MAXLINKS....

I think the VFS needs to check that unlinking a nondirectory won't
underflow its link count, and that rmdiring an (empty) subdirectory
won't underflow the link counts of the parent or child.

Cheng Lin, would you please fix all the filesystems at once instead of
just XFS?

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Dave Chinner Oct. 11, 2023, 10:21 p.m. UTC | #5
On Wed, Oct 11, 2023 at 02:41:05PM -0700, Darrick J. Wong wrote:
> On Thu, Oct 12, 2023 at 08:08:20AM +1100, Dave Chinner wrote:
> > On Wed, Oct 11, 2023 at 01:33:50PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > The VFS inc_nlink function does not explicitly check for integer
> > > overflows in the i_nlink field.  Instead, it checks the link count
> > > against s_max_links in the vfs_{link,create,rename} functions.  XFS
> > > sets the maximum link count to 2.1 billion, so integer overflows should
> > > not be a problem.
> > > 
> > > However.  It's possible that online repair could find that a file has
> > > more than four billion links, particularly if the link count got
> > 
> > I don't think we should be attempting to fix that online - if we've
> > really found an inode with 4 billion links then something else has
> > gone wrong during repair because we shouldn't get there in the first
> > place.
> 
> I don't agree -- if online repair really does find 3 billion dirents
> pointing to a (very) hardlinked file, then it should set the link count
> to 3 billion.  The VFS will not let userspace add more hardlinks, but it
> will let userspace remove hardlinks.

Yet that leaves the inode with a corruption link count according to
the on disk format definition. We know stuff is going to go wrong
when the user starts trying to remove hardlinks, which will result
in having repair run again to (eventually) remove the PINNED value.

The situation is no different to having 5 billion hard links -
that's as invalid as having 3 billion hard links - but we are using
language lawyering to split the fine hair that is "corrupt but
technically still usable" and "corrupt and unrecoverable".

Both situations are less than ideal, and if we solve the 5 billion
hardlink case, there is no reason at all for letting this whacky
"treat unlinked as negative until it pins" overflow case exist at
all.

> > Regardless, I don't think fixing nlink overflow cases should be done
> > online. A couple of billion links to a single inode takes a
> > *long* time to create and even longer to validate (and take a -lot-
> > of memory).
> 
> xfs_repair will burn a lot of memory and time doing that; xfs_scrub will
> take just as much time but only O(icount) memory.

Yup, I've seen repair take *3 weeks* and 250GB of RAM to validate
production filesystems with a couple of billion hard links and
individual inode link counts in the ~100million range. We're talking
about an order of magnitude higher link counts here - the validation
runtime alone is massively problematic, let alone deciding what
should be done with overflows. Fixing this requires human
intervention to decide what to do...

> > Hence I think we should just punt "more than 2.1
> > billion links" to the offline repair, because online repair can't do
> > anything to actually find whatever caused the overflow in the
> > first place, nor can it actually fix it - it should never have
> > happened in the first place....
> 
> I don't think deleting dirents to reduce link count is a good idea,
> since the repair tool will have no idea which directory links are more
> deletable than others.

I never said we should delete directory links. I said we should punt
it to an admin to decide how to fix (i.e. to an offline repair
context).

> If repair finds XFS_MAXLINKS < nr_dirents < -1U, then I think we should
> reset the link count and let userspace decide if they're going to unlink
> the file to reduce the link count.  That's already what xfs_repair does,
> and xfs_scrub follows that behavior.
> 
> For nr_dirents > -1U, online repair just skips the file and reports that
> repairs didn't succeed.  xfs_repair overflows the u32 and won't notice
> that it's now set the link count to something suspiciously low.

Fixing this requires human intervention to decide what to do. If
it's a hardlink backup farm (the cases I've seen with this scale of
link counts) then the trivial solution is to duplicate the inode
everything is linked to and then move all the overflows to the
duplicate(s).

repair *could* do this automatically by duplicating the source inode
into lost+found and redirecting overflowed dirents to them. It
doesn't matter where the duplicated inodes live - there will be
directories pointing to them from all over the place. Doing this
will not result in any loss of data or directory entries - it just
means that the "shared" inode is not a single unique inode anymore.

If we solve the larger overflow problem this way, then the
XFS_MAXLINKS < nr_dirents < -1U case can also use this solution and
we no longer need to support a whacky "corrupt but technically still
usable" corner case....

Cheers,

Dave.
Cheng Lin Oct. 12, 2023, 11:05 a.m. UTC | #6
> On Thu, Oct 12, 2023 at 08:08:20AM +1100, Dave Chinner wrote:
> > On Wed, Oct 11, 2023 at 01:33:50PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > >
> > > The VFS inc_nlink function does not explicitly check for integer
> > > overflows in the i_nlink field.  Instead, it checks the link count
> > > against s_max_links in the vfs_{link,create,rename} functions.  XFS
> > > sets the maximum link count to 2.1 billion, so integer overflows should
> > > not be a problem.
> > >
> > > However.  It's possible that online repair could find that a file has
> > > more than four billion links, particularly if the link count got
> > > corrupted while creating hardlinks to the file.  The di_nlinkv2 field is
> > > not large enough to store a value larger than 2^32, so we ought to
> > > define a magic pin value of ~0U which means that the inode never gets
> > > deleted.  This will prevent a UAF error if the repair finds this
> > > situation and users begin deleting links to the file.
> > >
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >  fs/xfs/libxfs/xfs_format.h |    6 ++++++
> > >  fs/xfs/scrub/nlinks.c      |    8 ++++----
> > >  fs/xfs/scrub/repair.c      |   12 ++++++------
> > >  fs/xfs/xfs_inode.c         |   28 +++++++++++++++++++++++-----
> > >  4 files changed, 39 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > > index 6409dd22530f2..320522b887bb3 100644
> > > --- a/fs/xfs/libxfs/xfs_format.h
> > > +++ b/fs/xfs/libxfs/xfs_format.h
> > > @@ -896,6 +896,12 @@ static inline uint xfs_dinode_size(int version)
> > >   */
> > >  #define    XFS_MAXLINK        ((1U << 31) - 1U)
> > >
> > > +/*
> > > + * Any file that hits the maximum ondisk link count should be pinned to avoid
> > > + * a use-after-free situation.
> > > + */
> > > +#define XFS_NLINK_PINNED    (~0U)
> > > +
> > >  /*
> > >   * Values for di_format
> > >   *
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index 4db2c2a6538d6..30604e11182c4 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -910,15 +910,25 @@ xfs_init_new_inode(
> > >   */
> > >  static int            /* error */
> > >  xfs_droplink(
> > > -    xfs_trans_t *tp,
> > > -    xfs_inode_t *ip)
> > > +    struct xfs_trans    *tp,
> > > +    struct xfs_inode    *ip)
> > >  {
> > > +    struct inode        *inode = VFS_I(ip);
> > > +
> > >      xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
> > >
> > > -    drop_nlink(VFS_I(ip));
> > > +    if (inode->i_nlink == 0) {
> > > +        xfs_info_ratelimited(tp->t_mountp,
> > > + "Inode 0x%llx link count dropped below zero.  Pinning link count.",
> > > +                ip->i_ino);
> > > +        set_nlink(inode, XFS_NLINK_PINNED);
> > > +    }
> > > +    if (inode->i_nlink != XFS_NLINK_PINNED)
> > > +        drop_nlink(inode);
> > > +
> > >      xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> >
> > I think the di_nlink field now needs to be checked by verifiers to
> > ensure the value is in the range of:
> >
> >     (0 <= di_nlink <= XFS_MAXLINKS || di_nlink == XFS_NLINK_PINNED)
> >
> > And we need to ensure that in xfs_bumplink() - or earlier (top avoid
> > dirty xaction cancle shutdowns) - that adding a count to di_nlink is
> > not going to exceed XFS_MAXLINKS....
> I think the VFS needs to check that unlinking a nondirectory won't
> underflow its link count, and that rmdiring an (empty) subdirectory
> won't underflow the link counts of the parent or child.
> Cheng Lin, would you please fix all the filesystems at once instead of
> just XFS?
As FS infrastructure, its change may have a significant impact.
> --D
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 6409dd22530f2..320522b887bb3 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -896,6 +896,12 @@  static inline uint xfs_dinode_size(int version)
  */
 #define	XFS_MAXLINK		((1U << 31) - 1U)
 
+/*
+ * Any file that hits the maximum ondisk link count should be pinned to avoid
+ * a use-after-free situation.
+ */
+#define XFS_NLINK_PINNED	(~0U)
+
 /*
  * Values for di_format
  *
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 4db2c2a6538d6..30604e11182c4 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -910,15 +910,25 @@  xfs_init_new_inode(
  */
 static int			/* error */
 xfs_droplink(
-	xfs_trans_t *tp,
-	xfs_inode_t *ip)
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip)
 {
+	struct inode		*inode = VFS_I(ip);
+
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
 
-	drop_nlink(VFS_I(ip));
+	if (inode->i_nlink == 0) {
+		xfs_info_ratelimited(tp->t_mountp,
+ "Inode 0x%llx link count dropped below zero.  Pinning link count.",
+				ip->i_ino);
+		set_nlink(inode, XFS_NLINK_PINNED);
+	}
+	if (inode->i_nlink != XFS_NLINK_PINNED)
+		drop_nlink(inode);
+
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-	if (VFS_I(ip)->i_nlink)
+	if (inode->i_nlink)
 		return 0;
 
 	return xfs_iunlink(tp, ip);
@@ -932,9 +942,17 @@  xfs_bumplink(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*ip)
 {
+	struct inode		*inode = VFS_I(ip);
+
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
 
-	inc_nlink(VFS_I(ip));
+	if (inode->i_nlink == XFS_NLINK_PINNED - 1)
+		xfs_info_ratelimited(tp->t_mountp,
+ "Inode 0x%llx link count exceeded maximum.  Pinning link count.",
+				ip->i_ino);
+	if (inode->i_nlink != XFS_NLINK_PINNED)
+		inc_nlink(inode);
+
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 }