Message ID | CABRjmp_FK8PYyAhBjAqE3Er8gfPo+EkJA5+yzVg7SdBYEwjJeA@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote: > This lets userland get the filesystem freezing status, aka whether the > filesystem is frozen or not. This is so that an application can know if > it should freeze the filesystem or if it isn't necessary when taking a > snapshot. The feature may be useful in general, I don't know. However, I'm confused why programs would depend on it. If you froze a particular subsystem, you don't have to check. If you did not, what prevents whoever originaly froze it from unfreezing as you access it? As such, maybe the feature you are looking for would count how many times the fs is frozen.
On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote: > This lets userland get the filesystem freezing status, aka whether the > filesystem is frozen or not. This is so that an application can know if > it should freeze the filesystem or if it isn't necessary when taking a > snapshot. freezing nests, so there is no reason for avoiding a freeze when doing a snapshot. Indeed, if you don't wrap freeze/thaw around a snapshot, then if the fs is thawed while the snapshot is in progress then you are going to get a corrupt snapshot.... And, besides, polling for frozenness from userspace is inherently racy - by the time the syscall returns, the information may be incorrect, so you can't rely on it for decision making purposes in userspace. > +static int ioctl_fsgetfrozen(struct file *filp) > +{ > + struct super_block *sb = file_inode(filp)->i_sb; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > + return sb->s_writers.frozen; This makes the internal freeze implementation states part of the userspace ABI. This needs an API that is separate from the internal implementation... Cheers, Dave.
On Fri, 2016-04-15 at 12:17 +1000, Dave Chinner wrote: > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote: > > This lets userland get the filesystem freezing status, aka whether > > the > > filesystem is frozen or not. This is so that an application can > > know if > > it should freeze the filesystem or if it isn't necessary when > > taking a > > snapshot. > > freezing nests, so there is no reason for avoiding a freeze when > doing a snapshot. Indeed, if you don't wrap freeze/thaw around a > snapshot, then if the fs is thawed while the snapshot is in progress > then you are going to get a corrupt snapshot.... > > And, besides, polling for frozenness from userspace is inherently > racy - by the time the syscall returns, the information may be > incorrect, so you can't rely on it for decision making purposes in > userspace. The example I have is mostly about unfreezing. If I freeze, make a snapshot, then unfreeze. If my program crashes after the snapshot/before unfreezing, I'll be left with a frozen filesystem, which is undesirable, at best. One way to mitigate this is to keep the state of the snapshot, but it seems silly to keep this information separately (e.g. in a database) when the kernel has it available. > > > +static int ioctl_fsgetfrozen(struct file *filp) > > +{ > > + struct super_block *sb = file_inode(filp)->i_sb; > > + > > + if (!capable(CAP_SYS_ADMIN)) > > + return -EPERM; > > + > > + return sb->s_writers.frozen; > > This makes the internal freeze implementation states part of the > userspace ABI. This needs an API that is separate from the internal > implementation... I'm not sure I understand, do you mean it should be e.g.: return sb->s_writers.frozen ? 1 : 0; ? > > Cheers, > > Dave. Regards, Florian -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat 16-04-16 14:18:19, Florian Margaine wrote: > On Fri, 2016-04-15 at 12:17 +1000, Dave Chinner wrote: > > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote: > > > This lets userland get the filesystem freezing status, aka whether > > > the > > > filesystem is frozen or not. This is so that an application can > > > know if > > > it should freeze the filesystem or if it isn't necessary when > > > taking a > > > snapshot. > > > > freezing nests, so there is no reason for avoiding a freeze when > > doing a snapshot. Indeed, if you don't wrap freeze/thaw around a > > snapshot, then if the fs is thawed while the snapshot is in progress > > then you are going to get a corrupt snapshot.... > > > > And, besides, polling for frozenness from userspace is inherently > > racy - by the time the syscall returns, the information may be > > incorrect, so you can't rely on it for decision making purposes in > > userspace. > > The example I have is mostly about unfreezing. If I freeze, make a > snapshot, then unfreeze. If my program crashes after the > snapshot/before unfreezing, I'll be left with a frozen filesystem, > which is undesirable, at best. > > One way to mitigate this is to keep the state of the snapshot, but it > seems silly to keep this information separately (e.g. in a database) > when the kernel has it available. Well, administrator can always unfreeze manually if he sees some glitch like you described above has happened. And as Dave said, the interface you propose is racy - how would you recognize the following two situations? 1) Your app has crashed with frozen fs. 2) Some other application is just snapshotting the fs and will unfreeze it eventually. In the first case you need additional thaw, in the second case you must not thaw the fs. That's why we kept the decision about thawing the fs to the administrator... Honza
On 4/14/16 10:17 PM, Dave Chinner wrote: > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote: >> This lets userland get the filesystem freezing status, aka whether the >> filesystem is frozen or not. This is so that an application can know if >> it should freeze the filesystem or if it isn't necessary when taking a >> snapshot. > > freezing nests, so there is no reason for avoiding a freeze when > doing a snapshot. Sadly, no: # xfs_freeze -f /mnt/test # xfs_freeze -f /mnt/test xfs_freeze: cannot freeze filesystem at /mnt/test: Device or resource busy It used to, but it was broken^Wchanged quite some time ago. > Indeed, if you don't wrap freeze/thaw around a > snapshot, then if the fs is thawed while the snapshot is in progress > then you are going to get a corrupt snapshot.... Yep. IMHO what really needs to happen is to fix freeze to allow nesting again. A way to query freeze state might be nice, I think, but yeah, it's racy, so you can't depend on it - but it might be useful in the "huh, IO is failing, what's going on? Oh, it's frozen, ok" scenario... But if you want to be sure your snapshot is OK even while others are running concurrent snapshots, we need nested freezes to work again. -Eric > And, besides, polling for frozenness from userspace is inherently > racy - by the time the syscall returns, the information may be > incorrect, so you can't rely on it for decision making purposes in > userspace. > >> +static int ioctl_fsgetfrozen(struct file *filp) >> +{ >> + struct super_block *sb = file_inode(filp)->i_sb; >> + >> + if (!capable(CAP_SYS_ADMIN)) >> + return -EPERM; >> + >> + return sb->s_writers.frozen; > > This makes the internal freeze implementation states part of the > userspace ABI. This needs an API that is separate from the internal > implementation... > > Cheers, > > Dave. > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2016-04-18 at 11:20 -0400, Eric Sandeen wrote: > > On 4/14/16 10:17 PM, Dave Chinner wrote: > > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote: > > > This lets userland get the filesystem freezing status, aka > > > whether the > > > filesystem is frozen or not. This is so that an application can > > > know if > > > it should freeze the filesystem or if it isn't necessary when > > > taking a > > > snapshot. > > > > freezing nests, so there is no reason for avoiding a freeze when > > doing a snapshot. > > Sadly, no: > > # xfs_freeze -f /mnt/test > # xfs_freeze -f /mnt/test > xfs_freeze: cannot freeze filesystem at /mnt/test: Device or resource > busy > > It used to, but it was broken^Wchanged quite some time ago. I guess I can provide a patch for this. Silently making it a no-op if the FS is already frozen in ioctl_fsfreeze() should be good enough? > > > Indeed, if you don't wrap freeze/thaw around a > > snapshot, then if the fs is thawed while the snapshot is in > > progress > > then you are going to get a corrupt snapshot.... > > Yep. > > IMHO what really needs to happen is to fix freeze to allow nesting > again. > > A way to query freeze state might be nice, I think, but yeah, it's > racy, so you can't depend on it - but it might be useful in the "huh, > IO is failing, what's going on? Oh, it's frozen, ok" scenario... I guess my original use case was a bit contrived, but that or simply monitoring would be glad to have this method, indeed. > > But if you want to be sure your snapshot is OK even while others are > running concurrent snapshots, we need nested freezes to work again. > > -Eric > > > And, besides, polling for frozenness from userspace is inherently > > racy - by the time the syscall returns, the information may be > > incorrect, so you can't rely on it for decision making purposes in > > userspace. > > > > > +static int ioctl_fsgetfrozen(struct file *filp) > > > +{ > > > + struct super_block *sb = file_inode(filp)->i_sb; > > > + > > > + if (!capable(CAP_SYS_ADMIN)) > > > + return -EPERM; > > > + > > > + return sb->s_writers.frozen; > > > > This makes the internal freeze implementation states part of the > > userspace ABI. This needs an API that is separate from the > > internal > > implementation... > > > > Cheers, > > > > Dave. > > Regards, Florian -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/18/16 1:20 PM, Florian Margaine wrote: > On Mon, 2016-04-18 at 11:20 -0400, Eric Sandeen wrote: >> > >> > On 4/14/16 10:17 PM, Dave Chinner wrote: >>> > > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote: >>>> > > > This lets userland get the filesystem freezing status, aka >>>> > > > whether the >>>> > > > filesystem is frozen or not. This is so that an application can >>>> > > > know if >>>> > > > it should freeze the filesystem or if it isn't necessary when >>>> > > > taking a >>>> > > > snapshot. >>> > > >>> > > freezing nests, so there is no reason for avoiding a freeze when >>> > > doing a snapshot. >> > >> > Sadly, no: >> > >> > # xfs_freeze -f /mnt/test >> > # xfs_freeze -f /mnt/test >> > xfs_freeze: cannot freeze filesystem at /mnt/test: Device or resource >> > busy >> > >> > It used to, but it was broken^Wchanged quite some time ago. > I guess I can provide a patch for this. Silently making it a no-op if > the FS is already frozen in ioctl_fsfreeze() should be good enough? That doesn't work, because you can go: Process A: Freeze Process A: Start snapshotting Process B: Freeze; already frozen, no-op Process B: Start snapshotting Process A: Snapshot done, unfreeze Process B: Now B is snapshotting an unfrozen filesystem Freeze needs to be nested so that two freeze calls require two thaw calls to make the filesystem active again, so that any given process calling freeze can be reasonably sure that it will stay frozen until it calls unfreeze at some later point. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 18, 2016 at 11:20:22AM -0400, Eric Sandeen wrote: > > > On 4/14/16 10:17 PM, Dave Chinner wrote: > > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote: > >> This lets userland get the filesystem freezing status, aka whether the > >> filesystem is frozen or not. This is so that an application can know if > >> it should freeze the filesystem or if it isn't necessary when taking a > >> snapshot. > > > > freezing nests, so there is no reason for avoiding a freeze when > > doing a snapshot. > > Sadly, no: > > # xfs_freeze -f /mnt/test > # xfs_freeze -f /mnt/test > xfs_freeze: cannot freeze filesystem at /mnt/test: Device or resource busy > > It used to, but it was broken^Wchanged quite some time ago. Ugh. Block device freeze nesting still works (i.e. freeze_bdev, as snapshots from DM would use), but I didn't realise (or had forgetten) that superblock level freeze nesting had been removed... > > Indeed, if you don't wrap freeze/thaw around a > > snapshot, then if the fs is thawed while the snapshot is in progress > > then you are going to get a corrupt snapshot.... > > Yep. > > IMHO what really needs to happen is to fix freeze to allow nesting > again. Probably. I quick dig shows nesting was intentionally broken more than 5 years ago in making the freeze ioctl work on btrfs. commit 18e9e5104fcd9a973ffe3eed3816c87f2a1b6cd2 Author: Josef Bacik <josef@redhat.com> Date: Tue Mar 23 10:34:56 2010 -0400 Introduce freeze_super and thaw_super for the fsfreeze ioctl ..... The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if the fs is already frozen. I thought this was a better solution than adding a freeze counter to the super_block, but if everybody hates this idea I'm open to suggestions. Thanks, .... Not sure many people noticed that at the time.... > A way to query freeze state might be nice, I think, but yeah, it's > racy, so you can't depend on it - but it might be useful in the "huh, > IO is failing, what's going on? Oh, it's frozen, ok" scenario... So maybe we should just add the frozen state to /proc/self/mountinfo or something similar, then people who think it matters can shoot themselves in the foot all they want without us needing to care about it. Cheers, Dave.
On Tue, Apr 19, 2016 at 1:06 AM, Dave Chinner <david@fromorbit.com> wrote: >> A way to query freeze state might be nice, I think, but yeah, it's >> racy, so you can't depend on it - but it might be useful in the "huh, >> IO is failing, what's going on? Oh, it's frozen, ok" scenario... > > So maybe we should just add the frozen state to /proc/self/mountinfo > or something similar, then people who think it matters can shoot > themselves in the foot all they want without us needing to care > about it. Sorry if it's a basic question, but what is the difference between /proc/self/mountinfo and providing an ioctl call? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com Regards, Florian -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 22, 2016 at 11:53:48PM +0200, Florian Margaine wrote: > On Tue, Apr 19, 2016 at 1:06 AM, Dave Chinner <david@fromorbit.com> wrote: > >> A way to query freeze state might be nice, I think, but yeah, it's > >> racy, so you can't depend on it - but it might be useful in the "huh, > >> IO is failing, what's going on? Oh, it's frozen, ok" scenario... > > > > So maybe we should just add the frozen state to /proc/self/mountinfo > > or something similar, then people who think it matters can shoot > > themselves in the foot all they want without us needing to care > > about it. > > Sorry if it's a basic question, but what is the difference between > /proc/self/mountinfo and providing an ioctl call? Simply this: $ grep xfs /proc/self/mountinfo 22 0 9:0 / / rw,relatime - xfs /dev/block/9:0 rw,attr2,inode64,sunit=1024,swidth=2048,noquota i.e. no need for a custom binary to query state in diagnostic situations where it might matter. And it shows up in things like SOS reports that distro's gather when users report issues. Cheers, Dave.
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c index bd01b92..d2173ab 100644 --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -918,6 +918,7 @@ COMPATIBLE_IOCTL(FIGETBSZ) COMPATIBLE_IOCTL(FIFREEZE) COMPATIBLE_IOCTL(FITHAW) COMPATIBLE_IOCTL(FITRIM) +COMPATIBLE_IOCTL(FIGETFROZEN) COMPATIBLE_IOCTL(KDGETKEYCODE) COMPATIBLE_IOCTL(KDSETKEYCODE) COMPATIBLE_IOCTL(KDGKBTYPE) diff --git a/fs/ioctl.c b/fs/ioctl.c index 116a333..249ed20 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -568,6 +568,16 @@ static int ioctl_fsthaw(struct file *filp) return thaw_super(sb); } +static int ioctl_fsgetfrozen(struct file *filp) +{ + struct super_block *sb = file_inode(filp)->i_sb; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + return sb->s_writers.frozen; +} + static long ioctl_file_dedupe_range(struct file *file, void __user *arg) { struct file_dedupe_range __user *argp = arg; @@ -652,6 +662,9 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, error = ioctl_fsthaw(filp); break; + case FIGETFROZEN: + return put_user(ioctl_fsgetfrozen(filp), argp); + case FS_IOC_FIEMAP: return ioctl_fiemap(filp, arg); diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 149bec8..d48f19c 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -230,6 +230,7 @@ struct fsxattr { #define FIFREEZE _IOWR('X', 119, int) /* Freeze */ #define FITHAW _IOWR('X', 120, int) /* Thaw */ #define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */ +#define FIGETFROZEN _IOWR('X', 122, int) /* Frozen status */ #define FICLONE _IOW(0x94, 9, int) #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range) #define FIDEDUPERANGE _IOWR(0x94, 54, struct file_dedupe_range)