diff mbox

fs: add the FIGETFROZEN ioctl call

Message ID CABRjmp_FK8PYyAhBjAqE3Er8gfPo+EkJA5+yzVg7SdBYEwjJeA@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Florian Margaine April 14, 2016, 7:57 a.m. UTC
This lets userland get the filesystem freezing status, aka whether the
filesystem is frozen or not. This is so that an application can know if
it should freeze the filesystem or if it isn't necessary when taking a
snapshot.
---
 fs/compat_ioctl.c       |  1 +
 fs/ioctl.c              | 13 +++++++++++++
 include/uapi/linux/fs.h |  1 +
 3 files changed, 15 insertions(+)

Comments

Mateusz Guzik April 15, 2016, 12:50 a.m. UTC | #1
On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote:
> This lets userland get the filesystem freezing status, aka whether the
> filesystem is frozen or not. This is so that an application can know if
> it should freeze the filesystem or if it isn't necessary when taking a
> snapshot.

The feature may be useful in general, I don't know.

However, I'm confused why programs would depend on it. If you froze
a particular subsystem, you don't have to check. If you did not, what
prevents whoever originaly froze it from unfreezing as you access it?

As such, maybe the feature you are looking for would count how many
times the fs is frozen.
Dave Chinner April 15, 2016, 2:17 a.m. UTC | #2
On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote:
> This lets userland get the filesystem freezing status, aka whether the
> filesystem is frozen or not. This is so that an application can know if
> it should freeze the filesystem or if it isn't necessary when taking a
> snapshot.

freezing nests, so there is no reason for avoiding a freeze when
doing a snapshot. Indeed, if you don't wrap freeze/thaw around a
snapshot, then if the fs is thawed while the snapshot is in progress
then you are going to get a corrupt snapshot....

And, besides, polling for frozenness from userspace is inherently
racy - by the time the syscall returns, the information may be
incorrect, so you can't rely on it for decision making purposes in
userspace.

> +static int ioctl_fsgetfrozen(struct file *filp)
> +{
> + struct super_block *sb = file_inode(filp)->i_sb;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + return sb->s_writers.frozen;

This makes the internal freeze implementation states part of the
userspace ABI.  This needs an API that is separate from the internal
implementation...

Cheers,

Dave.
Florian Margaine April 16, 2016, 12:18 p.m. UTC | #3
On Fri, 2016-04-15 at 12:17 +1000, Dave Chinner wrote:
> On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote:
> > This lets userland get the filesystem freezing status, aka whether
> > the
> > filesystem is frozen or not. This is so that an application can
> > know if
> > it should freeze the filesystem or if it isn't necessary when
> > taking a
> > snapshot.
> 
> freezing nests, so there is no reason for avoiding a freeze when
> doing a snapshot. Indeed, if you don't wrap freeze/thaw around a
> snapshot, then if the fs is thawed while the snapshot is in progress
> then you are going to get a corrupt snapshot....
> 
> And, besides, polling for frozenness from userspace is inherently
> racy - by the time the syscall returns, the information may be
> incorrect, so you can't rely on it for decision making purposes in
> userspace.

The example I have is mostly about unfreezing. If I freeze, make a
snapshot, then unfreeze. If my program crashes after the
snapshot/before unfreezing, I'll be left with a frozen filesystem,
which is undesirable, at best.

One way to mitigate this is to keep the state of the snapshot, but it
seems silly to keep this information separately (e.g. in a database)
when the kernel has it available.

> 
> > +static int ioctl_fsgetfrozen(struct file *filp)
> > +{
> > + struct super_block *sb = file_inode(filp)->i_sb;
> > +
> > + if (!capable(CAP_SYS_ADMIN))
> > + return -EPERM;
> > +
> > + return sb->s_writers.frozen;
> 
> This makes the internal freeze implementation states part of the
> userspace ABI.  This needs an API that is separate from the internal
> implementation...

I'm not sure I understand, do you mean it should be e.g.:

return sb->s_writers.frozen ? 1 : 0;

?

> 
> Cheers,
> 
> Dave.

Regards,
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara April 17, 2016, 3:05 p.m. UTC | #4
On Sat 16-04-16 14:18:19, Florian Margaine wrote:
> On Fri, 2016-04-15 at 12:17 +1000, Dave Chinner wrote:
> > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote:
> > > This lets userland get the filesystem freezing status, aka whether
> > > the
> > > filesystem is frozen or not. This is so that an application can
> > > know if
> > > it should freeze the filesystem or if it isn't necessary when
> > > taking a
> > > snapshot.
> > 
> > freezing nests, so there is no reason for avoiding a freeze when
> > doing a snapshot. Indeed, if you don't wrap freeze/thaw around a
> > snapshot, then if the fs is thawed while the snapshot is in progress
> > then you are going to get a corrupt snapshot....
> > 
> > And, besides, polling for frozenness from userspace is inherently
> > racy - by the time the syscall returns, the information may be
> > incorrect, so you can't rely on it for decision making purposes in
> > userspace.
> 
> The example I have is mostly about unfreezing. If I freeze, make a
> snapshot, then unfreeze. If my program crashes after the
> snapshot/before unfreezing, I'll be left with a frozen filesystem,
> which is undesirable, at best.
> 
> One way to mitigate this is to keep the state of the snapshot, but it
> seems silly to keep this information separately (e.g. in a database)
> when the kernel has it available.

Well, administrator can always unfreeze manually if he sees some glitch
like you described above has happened. And as Dave said, the interface you
propose is racy - how would you recognize the following two situations?

1) Your app has crashed with frozen fs.

2) Some other application is just snapshotting the fs and will unfreeze it
eventually.

In the first case you need additional thaw, in the second case you must not
thaw the fs. That's why we kept the decision about thawing the fs to the
administrator...

								Honza
Eric Sandeen April 18, 2016, 3:20 p.m. UTC | #5
On 4/14/16 10:17 PM, Dave Chinner wrote:
> On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote:
>> This lets userland get the filesystem freezing status, aka whether the
>> filesystem is frozen or not. This is so that an application can know if
>> it should freeze the filesystem or if it isn't necessary when taking a
>> snapshot.
> 
> freezing nests, so there is no reason for avoiding a freeze when
> doing a snapshot. 

Sadly, no:

# xfs_freeze -f /mnt/test
# xfs_freeze -f /mnt/test
xfs_freeze: cannot freeze filesystem at /mnt/test: Device or resource busy

It used to, but it was broken^Wchanged quite some time ago.

> Indeed, if you don't wrap freeze/thaw around a
> snapshot, then if the fs is thawed while the snapshot is in progress
> then you are going to get a corrupt snapshot....

Yep.

IMHO what really needs to happen is to fix freeze to allow nesting
again.

A way to query freeze state might be nice, I think, but yeah, it's
racy, so you can't depend on it - but it might be useful in the "huh,
IO is failing, what's going on?  Oh, it's frozen, ok" scenario...

But if you want to be sure your snapshot is OK even while others are
running concurrent snapshots, we need nested freezes to work again.

-Eric

> And, besides, polling for frozenness from userspace is inherently
> racy - by the time the syscall returns, the information may be
> incorrect, so you can't rely on it for decision making purposes in
> userspace.
> 
>> +static int ioctl_fsgetfrozen(struct file *filp)
>> +{
>> + struct super_block *sb = file_inode(filp)->i_sb;
>> +
>> + if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + return sb->s_writers.frozen;
> 
> This makes the internal freeze implementation states part of the
> userspace ABI.  This needs an API that is separate from the internal
> implementation...
> 
> Cheers,
> 
> Dave.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Margaine April 18, 2016, 5:20 p.m. UTC | #6
On Mon, 2016-04-18 at 11:20 -0400, Eric Sandeen wrote:
> 
> On 4/14/16 10:17 PM, Dave Chinner wrote:
> > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote:
> > > This lets userland get the filesystem freezing status, aka
> > > whether the
> > > filesystem is frozen or not. This is so that an application can
> > > know if
> > > it should freeze the filesystem or if it isn't necessary when
> > > taking a
> > > snapshot.
> > 
> > freezing nests, so there is no reason for avoiding a freeze when
> > doing a snapshot. 
> 
> Sadly, no:
> 
> # xfs_freeze -f /mnt/test
> # xfs_freeze -f /mnt/test
> xfs_freeze: cannot freeze filesystem at /mnt/test: Device or resource
> busy
> 
> It used to, but it was broken^Wchanged quite some time ago.

I guess I can provide a patch for this. Silently making it a no-op if
the FS is already frozen in ioctl_fsfreeze() should be good enough?

> 
> > Indeed, if you don't wrap freeze/thaw around a
> > snapshot, then if the fs is thawed while the snapshot is in
> > progress
> > then you are going to get a corrupt snapshot....
> 
> Yep.
> 
> IMHO what really needs to happen is to fix freeze to allow nesting
> again.
> 
> A way to query freeze state might be nice, I think, but yeah, it's
> racy, so you can't depend on it - but it might be useful in the "huh,
> IO is failing, what's going on?  Oh, it's frozen, ok" scenario...

I guess my original use case was a bit contrived, but that or simply
monitoring would be glad to have this method, indeed.

> 
> But if you want to be sure your snapshot is OK even while others are
> running concurrent snapshots, we need nested freezes to work again.
> 
> -Eric
> 
> > And, besides, polling for frozenness from userspace is inherently
> > racy - by the time the syscall returns, the information may be
> > incorrect, so you can't rely on it for decision making purposes in
> > userspace.
> > 
> > > +static int ioctl_fsgetfrozen(struct file *filp)
> > > +{
> > > + struct super_block *sb = file_inode(filp)->i_sb;
> > > +
> > > + if (!capable(CAP_SYS_ADMIN))
> > > + return -EPERM;
> > > +
> > > + return sb->s_writers.frozen;
> > 
> > This makes the internal freeze implementation states part of the
> > userspace ABI.  This needs an API that is separate from the
> > internal
> > implementation...
> > 
> > Cheers,
> > 
> > Dave.
> > 

Regards,
Florian

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen April 18, 2016, 5:47 p.m. UTC | #7
On 4/18/16 1:20 PM, Florian Margaine wrote:
> On Mon, 2016-04-18 at 11:20 -0400, Eric Sandeen wrote:
>> > 
>> > On 4/14/16 10:17 PM, Dave Chinner wrote:
>>> > > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote:
>>>> > > > This lets userland get the filesystem freezing status, aka
>>>> > > > whether the
>>>> > > > filesystem is frozen or not. This is so that an application can
>>>> > > > know if
>>>> > > > it should freeze the filesystem or if it isn't necessary when
>>>> > > > taking a
>>>> > > > snapshot.
>>> > > 
>>> > > freezing nests, so there is no reason for avoiding a freeze when
>>> > > doing a snapshot. 
>> > 
>> > Sadly, no:
>> > 
>> > # xfs_freeze -f /mnt/test
>> > # xfs_freeze -f /mnt/test
>> > xfs_freeze: cannot freeze filesystem at /mnt/test: Device or resource
>> > busy
>> > 
>> > It used to, but it was broken^Wchanged quite some time ago.
> I guess I can provide a patch for this. Silently making it a no-op if
> the FS is already frozen in ioctl_fsfreeze() should be good enough?

That doesn't work, because you can go:

Process A: Freeze
Process A: Start snapshotting
Process B: Freeze; already frozen, no-op
Process B: Start snapshotting
Process A: Snapshot done, unfreeze
Process B: Now B is snapshotting an unfrozen filesystem

Freeze needs to be nested so that two freeze calls require two thaw
calls to make the filesystem active again, so that any given process
calling freeze can be reasonably sure that it will stay frozen until
it calls unfreeze at some later point.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner April 18, 2016, 11:06 p.m. UTC | #8
On Mon, Apr 18, 2016 at 11:20:22AM -0400, Eric Sandeen wrote:
> 
> 
> On 4/14/16 10:17 PM, Dave Chinner wrote:
> > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote:
> >> This lets userland get the filesystem freezing status, aka whether the
> >> filesystem is frozen or not. This is so that an application can know if
> >> it should freeze the filesystem or if it isn't necessary when taking a
> >> snapshot.
> > 
> > freezing nests, so there is no reason for avoiding a freeze when
> > doing a snapshot. 
> 
> Sadly, no:
> 
> # xfs_freeze -f /mnt/test
> # xfs_freeze -f /mnt/test
> xfs_freeze: cannot freeze filesystem at /mnt/test: Device or resource busy
> 
> It used to, but it was broken^Wchanged quite some time ago.

Ugh. Block device freeze nesting still works (i.e. freeze_bdev, as
snapshots from DM would use), but I didn't realise (or had
forgetten) that superblock level freeze nesting had been removed...

> > Indeed, if you don't wrap freeze/thaw around a
> > snapshot, then if the fs is thawed while the snapshot is in progress
> > then you are going to get a corrupt snapshot....
> 
> Yep.
> 
> IMHO what really needs to happen is to fix freeze to allow nesting
> again.

Probably. I quick dig shows nesting was intentionally broken more
than 5 years ago in making the freeze ioctl work on btrfs. 

commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2
Author: Josef Bacik <josef@redhat.com>
Date:   Tue Mar 23 10:34:56 2010 -0400

    Introduce freeze_super and thaw_super for the fsfreeze ioctl
.....
    The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if
    the fs is already frozen.  I thought this was a better solution than adding a
    freeze counter to the super_block, but if everybody hates this idea I'm open to
    suggestions.  Thanks,
....

Not sure many people noticed that at the time....

> A way to query freeze state might be nice, I think, but yeah, it's
> racy, so you can't depend on it - but it might be useful in the "huh,
> IO is failing, what's going on?  Oh, it's frozen, ok" scenario...

So maybe we should just add the frozen state to /proc/self/mountinfo
or something similar, then people who think it matters can shoot
themselves in the foot all they want without us needing to care
about it.

Cheers,

Dave.
Florian Margaine April 22, 2016, 9:53 p.m. UTC | #9
On Tue, Apr 19, 2016 at 1:06 AM, Dave Chinner <david@fromorbit.com> wrote:
>> A way to query freeze state might be nice, I think, but yeah, it's
>> racy, so you can't depend on it - but it might be useful in the "huh,
>> IO is failing, what's going on?  Oh, it's frozen, ok" scenario...
>
> So maybe we should just add the frozen state to /proc/self/mountinfo
> or something similar, then people who think it matters can shoot
> themselves in the foot all they want without us needing to care
> about it.

Sorry if it's a basic question, but what is the difference between
/proc/self/mountinfo and providing an ioctl call?

>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com


Regards,
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner April 22, 2016, 11:14 p.m. UTC | #10
On Fri, Apr 22, 2016 at 11:53:48PM +0200, Florian Margaine wrote:
> On Tue, Apr 19, 2016 at 1:06 AM, Dave Chinner <david@fromorbit.com> wrote:
> >> A way to query freeze state might be nice, I think, but yeah, it's
> >> racy, so you can't depend on it - but it might be useful in the "huh,
> >> IO is failing, what's going on?  Oh, it's frozen, ok" scenario...
> >
> > So maybe we should just add the frozen state to /proc/self/mountinfo
> > or something similar, then people who think it matters can shoot
> > themselves in the foot all they want without us needing to care
> > about it.
> 
> Sorry if it's a basic question, but what is the difference between
> /proc/self/mountinfo and providing an ioctl call?

Simply this:

$ grep xfs /proc/self/mountinfo
22 0 9:0 / / rw,relatime - xfs /dev/block/9:0 rw,attr2,inode64,sunit=1024,swidth=2048,noquota

i.e. no need for a custom binary to query state in diagnostic
situations where it might matter. And it shows up in things like SOS
reports that distro's gather when users report issues.

Cheers,

Dave.
diff mbox

Patch

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index bd01b92..d2173ab 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -918,6 +918,7 @@  COMPATIBLE_IOCTL(FIGETBSZ)
 COMPATIBLE_IOCTL(FIFREEZE)
 COMPATIBLE_IOCTL(FITHAW)
 COMPATIBLE_IOCTL(FITRIM)
+COMPATIBLE_IOCTL(FIGETFROZEN)
 COMPATIBLE_IOCTL(KDGETKEYCODE)
 COMPATIBLE_IOCTL(KDSETKEYCODE)
 COMPATIBLE_IOCTL(KDGKBTYPE)
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 116a333..249ed20 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -568,6 +568,16 @@  static int ioctl_fsthaw(struct file *filp)
  return thaw_super(sb);
 }

+static int ioctl_fsgetfrozen(struct file *filp)
+{
+ struct super_block *sb = file_inode(filp)->i_sb;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ return sb->s_writers.frozen;
+}
+
 static long ioctl_file_dedupe_range(struct file *file, void __user *arg)
 {
  struct file_dedupe_range __user *argp = arg;
@@ -652,6 +662,9 @@  int do_vfs_ioctl(struct file *filp, unsigned int
fd, unsigned int cmd,
  error = ioctl_fsthaw(filp);
  break;

+ case FIGETFROZEN:
+ return put_user(ioctl_fsgetfrozen(filp), argp);
+
  case FS_IOC_FIEMAP:
  return ioctl_fiemap(filp, arg);

diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 149bec8..d48f19c 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -230,6 +230,7 @@  struct fsxattr {
 #define FIFREEZE _IOWR('X', 119, int) /* Freeze */
 #define FITHAW _IOWR('X', 120, int) /* Thaw */
 #define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */
+#define FIGETFROZEN _IOWR('X', 122, int) /* Frozen status */
 #define FICLONE _IOW(0x94, 9, int)
 #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
 #define FIDEDUPERANGE _IOWR(0x94, 54, struct file_dedupe_range)