[RFC,0/14] xfs: Towards thin provisioning aware filesystems

Message ID	CAOQ4uxjEUAR1-1ikZSN2e99gK4k_mEMnhGGYvw=2Zj6oLOPVuQ@mail.gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> MIME-Version: 1.0 In-Reply-To: <20171103112626.GA19974@bfoster.bfoster> References: <20171026083322.20428-1-david@fromorbit.com> <20171030133117.GA3535@bfoster.bfoster> <20171030210941.GA5858@dastard> <20171031112432.GA7093@bfoster.bfoster> <20171101004513.GK5858@dastard> <20171101141720.GB11709@bfoster.bfoster> <20171101235300.GT5858@dastard> <20171102112533.GA15748@bfoster.bfoster> <20171102233017.GX5858@dastard> <20171103112626.GA19974@bfoster.bfoster> From: Amir Goldstein <amir73il@gmail.com> Date: Fri, 3 Nov 2017 14:19:15 +0200 Message-ID: <CAOQ4uxjEUAR1-1ikZSN2e99gK4k_mEMnhGGYvw=2Zj6oLOPVuQ@mail.gmail.com> Subject: Re: [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems To: Brian Foster <bfoster@redhat.com> Cc: Dave Chinner <david@fromorbit.com>, linux-xfs <linux-xfs@vger.kernel.org> Content-Type: text/plain; charset="UTF-8" Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk

Message ID

CAOQ4uxjEUAR1-1ikZSN2e99gK4k_mEMnhGGYvw=2Zj6oLOPVuQ@mail.gmail.com (mailing list archive)

State

New, archived

Headers

MIME-Version: 1.0
In-Reply-To: <20171103112626.GA19974@bfoster.bfoster>
References: <20171026083322.20428-1-david@fromorbit.com>
	<20171030133117.GA3535@bfoster.bfoster>
	<20171030210941.GA5858@dastard>
	<20171031112432.GA7093@bfoster.bfoster>
	<20171101004513.GK5858@dastard>
	<20171101141720.GB11709@bfoster.bfoster>
	<20171101235300.GT5858@dastard>
	<20171102112533.GA15748@bfoster.bfoster>
	<20171102233017.GX5858@dastard>
	<20171103112626.GA19974@bfoster.bfoster>
From: Amir Goldstein <amir73il@gmail.com>
Date: Fri, 3 Nov 2017 14:19:15 +0200
Message-ID: <CAOQ4uxjEUAR1-1ikZSN2e99gK4k_mEMnhGGYvw=2Zj6oLOPVuQ@mail.gmail.com>
Subject: Re: [RFC PATCH 0/14] xfs: Towards thin provisioning aware
	filesystems
To: Brian Foster <bfoster@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>, linux-xfs <linux-xfs@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-xfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Amir Goldstein Nov. 3, 2017, 12:19 p.m. UTC

On Fri, Nov 3, 2017 at 1:26 PM, Brian Foster <bfoster@redhat.com> wrote:
> On Fri, Nov 03, 2017 at 10:30:17AM +1100, Dave Chinner wrote:
>> On Thu, Nov 02, 2017 at 07:25:33AM -0400, Brian Foster wrote:
...
>>
>> As such, trying to determine a future proof growfs interface change
>> right now is simply a waste of time because we're not going to get
>> it right.
>>
>
> I'm not sure where the flags proposal thing came from, but regardless...
> I'm not trying to future proof/design a shrink interface. Rather, just
> reviewing the (use of the) interface as it already exists. I suppose I
> am kind of assuming that the current interface would enable some form of
> functionality in that hypothetical future world where physical shrink
> exists. Perhaps that is not so realistic, however, as you suggest.
>

Guys,

Can we PLEASE stop talking about physical shrink?
I get it, it's unlikely to happen, I sorry I brought up this example
in the first place.

Whether or not we implement physical shirk is not the main issue.
The main issue is implicitly changing the meaning of an existing API.
What can go wrong? I don't know and I rather not give examples,
because then people address the examples and not the conceptual flaw.

This is how I propose to smooth in the new API with as little pain as
possible for existing deployments, yet making sure that "usable block"
is only modified by new programs that intend to modify it:

----------------------
----------------

When xfs_grow WANTS to change size of fs known to be thin, it should set
in->imaxpct |= 1 << 8;

Dave,

Is that the complication of implementation you were talking about?
Really?

Don't you see that this is the right thing to do w.r.t. API design?
If it were you on the other side of review, I bet you would argued the
same as Brian and myself and use the argument that "The syscall and ioctl
APIs are littered with examples and we should pay heed to the lessons those
mistakes teach us." to oppose implicit API change.

Seriously, my opinion does not carry enough weight in the xfs community
to out weight your opinion and if it weren't for Brian who stepped in to
argue in favor of my claims, I would have given up trying to convince you.

Sorry if this is coming off as too harsh of a response.
The sole motivation behind this argument is to prevent pain in the future.
And you are right, we can never predict the future correctly, except for one
thing - that we WILL encounter use cases that none of us can see right now.

Cheers,
Amir
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Dave Chinner Nov. 6, 2017, 1:16 a.m. UTC | #1

On Fri, Nov 03, 2017 at 02:19:15PM +0200, Amir Goldstein wrote:
> Whether or not we implement physical shirk is not the main issue.
> The main issue is implicitly changing the meaning of an existing API.

I don't intend on changing the existing API.

I said "there are bugs in the RFC code, and I'm addressing them".

> This is how I propose to smooth in the new API with as little pain as
> possible for existing deployments, yet making sure that "usable block"
> is only modified by new programs that intend to modify it:
> 
> ----------------------
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 8f22fc579dbb..922798ebf3e8 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -171,10 +171,25 @@ xfs_growfs_data_private(
>         xfs_rfsblock_t          nfree;
>         xfs_agnumber_t          oagcount;
>         int                     pct;
> +       unsigned                ver;
>         xfs_trans_t             *tp;
> 
>         nb = in->newblocks;
> -       pct = in->imaxpct;
> +       pct = in->imaxpct & 0xff;
> +       ver = in->imaxpct >> 8;
> +       if (ver > 1)
> +               return -EINVAL;
> +
> +       /*
> +        * V0 API is only allowed to grow sb_usable_dblocks along with
> +        * sb_dblocks.  Any other change to sb_usable_dblocks requires
> +        * V1 API to prove that userspace is aware of usable_dblocks.
> +        */
> +       if (ver == 0 && xfs_sb_version_hasthinspace(mp->m_sb) &&
> +           (mp->m_sb.sb_usable_dblocks != mp->m_sb.sb_dblocks ||
> +            nb < mp->m_sb.sb_dblocks))
> +               return -EINVAL;
> +
>         if (nb < mp->m_sb.sb_dblocks || pct < 0 || pct > 100)
>                 return -EINVAL;
> 
> ----------------
> 
> When xfs_grow WANTS to change size of fs known to be thin, it should set
> in->imaxpct |= 1 << 8;
> 
> Dave,
> 
> Is that the complication of implementation you were talking about?
> Really?

No, it's not. Seems like everyone is still yelling over me instead
of listening.

Let's start with a demonstration. I'm going to make a thin
filesystem on a kernel running my current patchset, then I'm going
to use an old xfs_growfs binary (from 4.12) to try to grow and
shrink that filesystem.

So, create, mount:

$ sudo ~/packages/mkfs.xfs -f -dthin,size=1g /dev/pmem0
Default configuration sourced from package build definitions
meta-data=/dev/pmem0             isize=512    agcount=8, agsize=262144 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=2097152, imaxpct=25, thinblocks=262144
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$ sudo mount /dev/pmem0 /mnt/test
$

Let's now use the old growfs to look at the filesystem:

$ sudo xfs_growfs -V
xfs_growfs version 4.12.0
$ sudo xfs_info /mnt/test
meta-data=/dev/pmem0             isize=512    agcount=8, agsize=262144 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=262144, imaxpct=25, thin=0
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$

It reports as a 1GB filesystem, but has 8 AGs of 1GB each. Clearly
a thinspace filesystem, but grwofs doesn't know that. The kernel
reports it's size as:

$ df -h /mnt/test
Filesystem      Size  Used Avail Use% Mounted on
/dev/pmem0     1006M   33M  974M   4% /mnt/test
$

Now, let's grow it:

$ sudo xfs_growfs -D 500000 /mnt/test
meta-data=/dev/pmem0             isize=512    agcount=8, agsize=262144 blks
[...]
data blocks changed from 262144 to 500000
$

And the kernel reported size:
$ df -h /mnt/test
Filesystem      Size  Used Avail Use% Mounted on
/dev/pmem0      1.9G   33M  1.9G   2% /mnt/test
$

And xfs_info for good measure:

$ sudo xfs_info /mnt/test
meta-data=/dev/pmem0             isize=512    agcount=8, agsize=262144 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=500000, imaxpct=25, thin=0
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$

The thinspace grow succeeded exactly as it should - it grew to
500k blocks without changing the physical filesystem. Thinspace is
completely transparent to the grow operation now, and will work with
old growfs binaries and apps just fine.

So, onto shrink with an old grwofs binary:

$ sudo xfs_growfs -D 400000 /mnt/test
meta-data=/dev/pmem0             isize=512    agcount=8, agsize=262144 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=500000, imaxpct=25, thin=0
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data size 400000 too small, old size is 500000
$

Oh, look, the old grwofs binary refused to shrink the filesystem.
It didn't even call into the filesystem, it just assumes that you
can't shrink XFS filesystems and doesn't even try.

So, this means old userspace and new thinspace filesystems are
pretty safe without having to change the interface. And tells us
that the main change the userspace growfs code needs is to allow
shrink to proceed. Hence, the new XFS_IOC_FSGEOMETRY ioctl and this
change:

-               if (!error && dsize < geo.datablocks) {
+               if (dsize < geo.datablocks && !thin_enabled) {
                        fprintf(stderr, _("data size %lld too small,"
                                " old size is %lld\n"),
                                (long long)dsize, (long long)geo.datablocks);

So, now the growfs binary can only shrink filesystems that report
as thinspace filesystems. It will still refuse to shrink filesystems
that cannot be shrunk. With a new binary:

$ sudo xfs_growfs -V
xfs_growfs version 4.13.1
$ sudo xfs_growfs -D 400000 /mnt/test
meta-data=/dev/pmem0             isize=512    agcount=8, agsize=262144 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=500000, imaxpct=25, thin=1
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 500000 to 400000
$ df -h /mnt/test
Filesystem      Size  Used Avail Use% Mounted on
/dev/pmem0      1.6G   33M  1.5G   3% /mnt/test
$

We can see it shrinks a thinspace filesystem just fine. A normal
thick filesystem:

$ sudo ~/packages/mkfs.xfs /dev/pmem1
Default configuration sourced from package build definitions
meta-data=/dev/pmem1             isize=512    agcount=4, agsize=524288 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=2097152, imaxpct=25, thinblocks=0
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$ sudo mount /dev/pmem1 /mnt/scratch
$ sudo xfs_growfs -D 400000 /mnt/scratch
[....]
data size 400000 too small, old size is 2097152
$

Fails in exactly the same way as they did before.

IOWs, the only thing that growfs requires was awareness that it
could shrink a filesystem (a XFS_IOC_FSGEOMETRY change) and then
without any other changes it can operate sanely on thinspace
filesystems. There is no need to change the actual
XFS_IOC_FSGROWDATA ioctl, because it doesn't care what the
underlying implementation does or supports....

----

That's a demonstration that the growfs interface does not need to
change for userspace to retain sane, predicatable, obvious behaviour
across both old and new growfs binaries with the introduction of
thinspace filesystems. People calling the growfs ioctl directly
without adding thinspace awareness will just see some filesystems
succeed when shrinking - they'll need to add checks for the
thinspace flag in the XFS_IOC_FSGEOMETRY results if they want to do
anything smarter, but otherwise thinspace grow and shrink will just
/work as expected/ with existing binaries.

Amir, my previous comments about unnecessary interface complication
is based on these observations. You seem to be making an underlying
assumption that existing binaries can't grow/shrink thinspace
filesystems without modification. This assumption is incorrect.
Indeed, code complexity isn't the issue here - the issue is the
smearing of implementation details into an abstract control command
that is completely independent of the underlying implementations.

Indeed, userspace does not need to be aware of whether the
filesystem is a thinspace filesystem or not - it just needs to know
the current size so that if it doesn't need to change the size (e.g.
imaxpct change) it puts the right value into the growfs control
structure.

That is the bit I got wrong in the RFC code I posted - the
XFS_IOC_FSGEOMETRY ioctl changes put the wrong size in
geo->datablocks for thinspace filesystems. I've said this multiple
times now, but the message doesn't seem to have sunk in - if we put
the right data in XFS_IOC_FSGEOMETRY, then userspace doesn't need to
care about whether it's a thin or a thick filesystem that it is
operating on.

However, if we start changing the growfs ioctl because it
[apparently] needs to be aware of thin/thick filesystems, we
introduce kernel/userspace compatibility issues that currently don't
exist. And, as I've clearly demonstrated, those interface
differences *do not need to exist* for both old and new growfs
binaries to work correctly with thinspace fielsystems.

That's the "complication of implementation" I was refering to -
introducing interface changes that would result in extra changes to
userspace binaries and as such introduce user visible binary
compatibility issues that users do not need to (and should not) be
exposed to. Not to mention other application developers that might
be using the existing geometry and grwofs ioctls - shrink will now
just work on existing binaries without them having to do
anything....

> Don't you see that this is the right thing to do w.r.t. API design?

No, I don't, because you're trying to solve a problem that, quite
simply, doesn't exist.

Cheers,

Dave.

Amir Goldstein Nov. 6, 2017, 9:48 a.m. UTC | #2

On Mon, Nov 6, 2017 at 3:16 AM, Dave Chinner <david@fromorbit.com> wrote:
...

> Not to mention other application developers that might
> be using the existing geometry and grwofs ioctls - shrink will now

Acknowledging that those "other application" may exist in the wild
makes it even harder to claim that allowing to change usable_dblocks
with existing API is not going to cause pain for users...

> just work on existing binaries without them having to do
> anything....
>
>> Don't you see that this is the right thing to do w.r.t. API design?
>
> No, I don't, because you're trying to solve a problem that, quite
> simply, doesn't exist.
>

It is *very* possible that you are right, but you have not proven that
the problem does not exist. You have proven that the problem
does not exist w.r.t old xfs_grow -D <size> and you correctly
claimed that the problem with old xfs_grow -m <imaxpct> is an
implementation bug with RFC patches.

Let me give an example that will demonstrate my concern.

One of our older NAS products, still deployed with many customers
has LVM based volume manager and ext3 file system.
When user changes the size of a volume via Web UI, lower level
commands will resize LVM and then resize2fs to max size.
Because "resize2fs to max size" is not an atomic operation and
because this is a "for dummies" product, in order to recover from
"half-resize", there is a post-mount script that runs resize2fs
unconditionally after boot.

So in this product, the LVM volume size is treated as an "intent log"
for file system size grow operation.

I find it hard to believe that this practice is so novel that nobody
else ever used it and for that matter with xfs file system over LVM
and xfs_grow -d.

Now imagine you upgrade such a system to a kernel that supports
"thinspace" and new xfsprogs and create thin file systems, and then
downgrade the system to a kernel that sill supports "thinspace", but
xfsprogs that do not (or even a proprietary system component that
uses XFS_IOC_FSGROWDATA ioctl to perform the "auto-grow").

The results will be that all the thin file systems will all "auto-grow"
to the thick size of the volume.

So the way I see it, my proposal to require explicitly
XFS_IOC_FSGROWDATA API V1 for any change to usable_dblocks
that is not coupled with same change to dblocks is meant to resolve
userspace/kernel compatibility issues.

And I fail to see how that requirement makes it hard to maintain
userspace/kernel compatibility:
- xfs_growfs needs to check for "thinspace" flag and if exists use V1 API
- old kernel can't mount "thinspace" fs, so it can never see V1 API
  unless from a buggy program, that will get -EINVAL
- old xfs_growfs will keep failing to shrink even a thin fs
- old xfs_growfs will succeed to grow, except (*) for a thin fs that
was previously shrunk

(*) That exception is relating to the example I described above,
and we seem to not be in agreement about the desired behavior.

IIUC, you like the fact that old xfs_grow can grow a thin and shrunk fs
where I see troubled lurking in this behavior.

So we can agree to disagree on the desired behavior, but for the
record, this and only this point is the API design flaw I am talking
about.

There may be complexities w.r.t maintaining userspace/kernel compatibility
with the proposed solution. I trust you on this because you have far
more experience than me with maintaining historic baggage of wrongly
designed APIs.

If no one else is concerned about the old xfs_grow -d use case and no one
else shares my opinion about the desired behavior in that use case, then
I withdraw my claims.

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Chinner Nov. 6, 2017, 9:46 p.m. UTC | #3

On Mon, Nov 06, 2017 at 11:48:05AM +0200, Amir Goldstein wrote:
> On Mon, Nov 6, 2017 at 3:16 AM, Dave Chinner <david@fromorbit.com> wrote:
> > just work on existing binaries without them having to do
> > anything....
> >
> >> Don't you see that this is the right thing to do w.r.t. API design?
> >
> > No, I don't, because you're trying to solve a problem that, quite
> > simply, doesn't exist.
> >
> 
> It is *very* possible that you are right, but you have not proven that
> the problem does not exist. You have proven that the problem
> does not exist w.r.t old xfs_grow -D <size> and you correctly
> claimed that the problem with old xfs_grow -m <imaxpct> is an
> implementation bug with RFC patches.
> 
> Let me give an example that will demonstrate my concern.
> 
> One of our older NAS products, still deployed with many customers
> has LVM based volume manager and ext3 file system.
> When user changes the size of a volume via Web UI, lower level
> commands will resize LVM and then resize2fs to max size.
> Because "resize2fs to max size" is not an atomic operation and
> because this is a "for dummies" product, in order to recover from
> "half-resize", there is a post-mount script that runs resize2fs
> unconditionally after boot.

Sure, but if you have a product using thinspace filesystems then you
are going to need to do something different. To think you can just
plug a thinspace filesystem into an existing stack and have it work
unmodified is naive at best.

Indeed, with a thinspace filesystem, the LVM volume *never needs
resizing* because it will be created far larger than ever required
in the first place. And recovering from a crash half way through a
thinspace growfs operation? Well, it's all done through the
filesystem journal because it's all done through transactions.

IOWs, thinspace filesystems solve this "atomic resize" problem
without needing crutches or external constructs to hide non-atomic
operations.

Essentially, if you have a storage product and you plug thinspace
filesystems into it without management modifications, then you get
to keep all the bits that break and don't work correctly.

All I care is that stupid things don't happen if someone has
integrated thinspace filesystems correctly and then a user
does this:

> Now imagine you upgrade such a system to a kernel that supports
> "thinspace" and new xfsprogs and create thin file systems, and then
> downgrade the system to a kernel that sill supports "thinspace", but
> xfsprogs that do not (or even a proprietary system component that
> uses XFS_IOC_FSGROWDATA ioctl to perform the "auto-grow").

In this case the thinspace filesystems will behave exactly like a
physical filesystem. i.e. they will physically grow to the size of
the underlying device. I can't stop this from happening, but I can
ensure that it doesn't do irreversable damage and that it's
reversible as soon as userspace is restored to suport thinspace
administration again. i.e. just shrink it back down to the required
thin size, and it's like that grow never occurred...

i.e. it's not the end of the world, and you can recover cleanly from
it without any issues.

> The results will be that all the thin file systems will all "auto-grow"
> to the thick size of the volume.

Of course it will - the user/app/admin asked the kernel to grow the
filesystem to the size of the underlying device.  I don't know what
you expect a thinspace filesystem to do here other than *grow the
filesystem to the size of the underlying device*.

> So we can agree to disagree on the desired behavior, but for the
> record, this and only this point is the API design flaw I am talking
> about.

What flaw?

You've come up with a scenario where the thinspace filesystem will
behave like physical filesystem because that's what the thinspace
unaware admin program expects it to be. i.e. thinspace is completely
transparent to userspace, and physical grow does not prevent
subsequent shrinks back to the original thinspace size.

None of this indicates that there is an interface problem...

Cheers,

Dave.

Amir Goldstein Nov. 7, 2017, 5:30 a.m. UTC | #4

On Mon, Nov 6, 2017 at 11:46 PM, Dave Chinner <david@fromorbit.com> wrote:
[...]
>> commands will resize LVM and then resize2fs to max size.
>> Because "resize2fs to max size" is not an atomic operation and
>> because this is a "for dummies" product, in order to recover from
>> "half-resize", there is a post-mount script that runs resize2fs
>> unconditionally after boot.
>
> Sure, but if you have a product using thinspace filesystems then you
> are going to need to do something different. To think you can just
> plug a thinspace filesystem into an existing stack and have it work
> unmodified is naive at best.
>

I do not expect it to "work" on the contrary - I expect it not to work,
just the same as xfs_repair will refuse to repair an xfs with unknown
feature flags.

[...]
>
>> Now imagine you upgrade such a system to a kernel that supports
>> "thinspace" and new xfsprogs and create thin file systems, and then
>> downgrade the system to a kernel that sill supports "thinspace", but
>> xfsprogs that do not (or even a proprietary system component that
>> uses XFS_IOC_FSGROWDATA ioctl to perform the "auto-grow").
>
> In this case the thinspace filesystems will behave exactly like a
> physical filesystem. i.e. they will physically grow to the size of
> the underlying device. I can't stop this from happening, but I can
> ensure that it doesn't do irreversable damage and that it's
> reversible as soon as userspace is restored to suport thinspace
> administration again. i.e. just shrink it back down to the required
> thin size, and it's like that grow never occurred...
>
> i.e. it's not the end of the world, and you can recover cleanly from
> it without any issues.
>

Very true, not the end of the world. That is why your design is something
I can "live with". But it does have potential to cause pain downstream in
the future and I just don't see any reason why this pain cannot be avoided.
I fail to see the downside of not allowing old xfs_grow to modify thin space.

>> The results will be that all the thin file systems will all "auto-grow"
>> to the thick size of the volume.
>
> Of course it will - the user/app/admin asked the kernel to grow the
> filesystem to the size of the underlying device.  I don't know what
> you expect a thinspace filesystem to do here other than *grow the
> filesystem to the size of the underlying device*.

I expect kernel to tell user EINVAL and warn that user needs to use newer
xfsprogs to auto grow thin space.

Let me re-iterate the requirement we are disagreeing on:
- old xfs_growfs will succeed to grow, *except* for a thin fs that
  was previously shrunk (i.e. dblocks != usable_dblocks)

You explained at length why the exception is not a must.
I do not remember a single argument that explains what's
wrong with keeping the exception.
I claimed that this exception can reduce pain to end users.
In response, you wrote that "user/app/admin asked to grow fs to
maximum size" and in so many words that they can "keep the pieces".

What bad things can happen if the clueless user/app/admin is refused
to grow fs to maximum size?
The practice of "not sure you know what you are doing so please keep
away" has been a very good practice for xfs and file systems for years.
Why not abide by this law in this case?

Cheers,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 8f22fc579dbb..922798ebf3e8 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -171,10 +171,25 @@  xfs_growfs_data_private(
        xfs_rfsblock_t          nfree;
        xfs_agnumber_t          oagcount;
        int                     pct;
+       unsigned                ver;
        xfs_trans_t             *tp;

        nb = in->newblocks;
-       pct = in->imaxpct;
+       pct = in->imaxpct & 0xff;
+       ver = in->imaxpct >> 8;
+       if (ver > 1)
+               return -EINVAL;
+
+       /*
+        * V0 API is only allowed to grow sb_usable_dblocks along with
+        * sb_dblocks.  Any other change to sb_usable_dblocks requires
+        * V1 API to prove that userspace is aware of usable_dblocks.
+        */
+       if (ver == 0 && xfs_sb_version_hasthinspace(mp->m_sb) &&
+           (mp->m_sb.sb_usable_dblocks != mp->m_sb.sb_dblocks ||
+            nb < mp->m_sb.sb_dblocks))
+               return -EINVAL;
+
        if (nb < mp->m_sb.sb_dblocks || pct < 0 || pct > 100)
                return -EINVAL;

[RFC,0/14] xfs: Towards thin provisioning aware filesystems

Commit Message

Comments

Patch