Message ID | 1495755004-17036-1-git-send-email-hyc.lee@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hyunchul, Am 26.05.2017 um 01:30 schrieb Hyunchul Lee: > From: Hyunchul Lee <cheol.lee@lge.com> > > for un/freeze support, implement freeze_super and un/freeze_fs > of super_operations. > ubifs_freeze_super just calls freeze_super. because freeze_super always > succeeds if file system is read-only, UBIFS errors should be checked. > if there are errors, UBIFS is switched to read-only mode. > ubifs_freeze_fs runs commit if TNC/LPT isn't clean. though all writes > are blocked and sync_fs is called before, if commit alreay was started > before writes are blocked, TNC/LPT might have dirty COW nodes. you explain how you implement that feature, but not why. What is the use-case? I always thought this interface is only being used by LVM. Thanks, //richard
On Fri, May 26, 2017 at 11:52:42AM +0200, Richard Weinberger wrote: > Hyunchul, > > Am 26.05.2017 um 01:30 schrieb Hyunchul Lee: > > From: Hyunchul Lee <cheol.lee@lge.com> > > > > for un/freeze support, implement freeze_super and un/freeze_fs > > of super_operations. > > ubifs_freeze_super just calls freeze_super. because freeze_super always > > succeeds if file system is read-only, UBIFS errors should be checked. > > if there are errors, UBIFS is switched to read-only mode. > > ubifs_freeze_fs runs commit if TNC/LPT isn't clean. though all writes > > are blocked and sync_fs is called before, if commit alreay was started > > before writes are blocked, TNC/LPT might have dirty COW nodes. > > you explain how you implement that feature, but not why. > What is the use-case? > I always thought this interface is only being used by LVM. It can also be used through the FIFREEZE ioctl, but without snapshots underneath the fs it's not all that useful.
> +static int ubifs_freeze_super(struct super_block *sb) > +{ > + struct ubifs_info *c = sb->s_fs_info; > + int err; > + > + dbg_gen("starting"); > + /* freeze_super always succeeds if file system is in read-only. > + * however if there are errors, UBIFS is switched to read-only mode. > + * so @ro_error should be checked. > + */ > + err = freeze_super(sb); > + if (!err && c->ro_error) { > + thaw_super(sb); > + return -EIO; > + } > + return err; This is just broken. First ubifs should still properly propagate the errors, and second freezing/unfreezing read only file systems is perfectly valid, and third the freeze_super method is a special hack for gfs2 that should not gain additional users.
On Sat, May 27, 2017 at 01:23:38AM -0700, Christoph Hellwig wrote: > > +static int ubifs_freeze_super(struct super_block *sb) > > +{ > > + struct ubifs_info *c = sb->s_fs_info; > > + int err; > > + > > + dbg_gen("starting"); > > + /* freeze_super always succeeds if file system is in read-only. > > + * however if there are errors, UBIFS is switched to read-only mode. > > + * so @ro_error should be checked. > > + */ > > + err = freeze_super(sb); > > + if (!err && c->ro_error) { > > + thaw_super(sb); > > + return -EIO; > > + } > > + return err; > > This is just broken. First ubifs should still properly propagate > the errors, and second freezing/unfreezing read only file systems is > perfectly valid, it is right. > and third the freeze_super method is a special > hack for gfs2 that should not gain additional users. I thought that it was ok. because commit 48b6bca says "every filesystem that implements this hooks must call the vfs freeze_super ..." Thank you for comment. > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/
Hi, Richard. On Fri, May 26, 2017 at 11:52:42AM +0200, Richard Weinberger wrote: > Hyunchul, > > Am 26.05.2017 um 01:30 schrieb Hyunchul Lee: > > From: Hyunchul Lee <cheol.lee@lge.com> > > > > for un/freeze support, implement freeze_super and un/freeze_fs > > of super_operations. > > ubifs_freeze_super just calls freeze_super. because freeze_super always > > succeeds if file system is read-only, UBIFS errors should be checked. > > if there are errors, UBIFS is switched to read-only mode. > > ubifs_freeze_fs runs commit if TNC/LPT isn't clean. though all writes > > are blocked and sync_fs is called before, if commit alreay was started > > before writes are blocked, TNC/LPT might have dirty COW nodes. > > you explain how you implement that feature, but not why. > What is the use-case? > I always thought this interface is only being used by LVM. Sorry, I forgot this. I implement this to make a backup of some files, and support fsfreeze utility and SysRq's freeze/thaw commmand. > > Thanks, > //richard > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/
Hi, Richard. On Mon, May 29, 2017 at 09:43:46AM +0900, Hyunchul Lee wrote: > On Sat, May 27, 2017 at 01:23:38AM -0700, Christoph Hellwig wrote: > > > +static int ubifs_freeze_super(struct super_block *sb) > > > +{ > > > + struct ubifs_info *c = sb->s_fs_info; > > > + int err; > > > + > > > + dbg_gen("starting"); > > > + /* freeze_super always succeeds if file system is in read-only. > > > + * however if there are errors, UBIFS is switched to read-only mode. > > > + * so @ro_error should be checked. > > > + */ > > > + err = freeze_super(sb); > > > + if (!err && c->ro_error) { > > > + thaw_super(sb); > > > + return -EIO; > > > + } > > > + return err; > > > > This is just broken. First ubifs should still properly propagate > > the errors, and second freezing/unfreezing read only file systems is > > perfectly valid, > > it is right. if updating TNC is failed, ubifs might become inconsistant and be switched to read-only mode. for example, when ubifs_jnl_update is called to create a file, if inserting a znode for new inode is failed, TNC has only a znode for new dentry. and this can be only recoverd by replay. is it required to fix this? > > > and third the freeze_super method is a special > > hack for gfs2 that should not gain additional users. > > I thought that it was ok. because commit 48b6bca says "every filesystem > that implements this hooks must call the vfs freeze_super ..." > > Thank you for comment. > > > > ______________________________________________________ > > Linux MTD discussion mailing list > > http://lists.infradead.org/mailman/listinfo/linux-mtd/ > > -- > > Thanks, > Hyunchul
and I missed the following case. in some embedded systems, clean-up for shutdown should be fast. during this clean-up, freeze file system to guarantee integrity. umount with MNT_DETACH is not suitable because of not killing tasks. On Mon, May 29, 2017 at 10:18:34AM +0900, Hyunchul Lee wrote: > Hi, Richard. > > On Fri, May 26, 2017 at 11:52:42AM +0200, Richard Weinberger wrote: > > Hyunchul, > > > > Am 26.05.2017 um 01:30 schrieb Hyunchul Lee: > > > From: Hyunchul Lee <cheol.lee@lge.com> > > > > > > for un/freeze support, implement freeze_super and un/freeze_fs > > > of super_operations. > > > ubifs_freeze_super just calls freeze_super. because freeze_super always > > > succeeds if file system is read-only, UBIFS errors should be checked. > > > if there are errors, UBIFS is switched to read-only mode. > > > ubifs_freeze_fs runs commit if TNC/LPT isn't clean. though all writes > > > are blocked and sync_fs is called before, if commit alreay was started > > > before writes are blocked, TNC/LPT might have dirty COW nodes. > > > > you explain how you implement that feature, but not why. > > What is the use-case? > > I always thought this interface is only being used by LVM. > > Sorry, I forgot this. I implement this to make a backup of some files, and > support fsfreeze utility and SysRq's freeze/thaw commmand. > > > > > Thanks, > > //richard > > > > ______________________________________________________ > > Linux MTD discussion mailing list > > http://lists.infradead.org/mailman/listinfo/linux-mtd/ > > -- > > Thanks, > Hyunchul > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/
On Mon, May 29, 2017 at 7:40 AM, Hyunchul Lee <hyc.lee@gmail.com> wrote: > > and I missed the following case. > > in some embedded systems, clean-up for shutdown should be fast. > during this clean-up, freeze file system to guarantee integrity. > umount with MNT_DETACH is not suitable because of not killing tasks. > Interesting point. It seems that good old "sync; reboot" does not cut it anymore. Not even emergency remount read-only sysrq trigger. Some of you may have been following this thread on fsdevel: https://www.spinics.net/lists/linux-xfs/msg06953.html Probably less have been following this longer thread on xfs list: https://www.spinics.net/lists/linux-xfs/msg06883.html
Hyunchul, Am 29.05.2017 um 04:24 schrieb Hyunchul Lee: >>> This is just broken. First ubifs should still properly propagate >>> the errors, and second freezing/unfreezing read only file systems is >>> perfectly valid, >> >> it is right. > > if updating TNC is failed, ubifs might become inconsistant and be switched to > read-only mode. for example, when ubifs_jnl_update is called to create a file, > if inserting a znode for new inode is failed, TNC has only a znode for > new dentry. and this can be only recoverd by replay. > > is it required to fix this? UBIFS is designed to be power-cut tolerant. So, UBIFS must not corrupt in any case. Which failure are you facing? I have the feeling that you try to paper over some other issue. :-) Thanks, //richard
Amir, Hyunchul, Am 29.05.2017 um 07:40 schrieb Amir Goldstein: > On Mon, May 29, 2017 at 7:40 AM, Hyunchul Lee <hyc.lee@gmail.com> wrote: >> >> and I missed the following case. >> >> in some embedded systems, clean-up for shutdown should be fast. >> during this clean-up, freeze file system to guarantee integrity. >> umount with MNT_DETACH is not suitable because of not killing tasks. more details please, UBIFS is designed to survive a power-cut/reboot/etc. at any time. How would it corrupt? > Interesting point. It seems that good old "sync; reboot" does not cut > it anymore. > Not even emergency remount read-only sysrq trigger. > > Some of you may have been following this thread on fsdevel: > https://www.spinics.net/lists/linux-xfs/msg06953.html > > Probably less have been following this longer thread on xfs list: > https://www.spinics.net/lists/linux-xfs/msg06883.html Well, UBIFS is a bit different. The UBIFS journal is not an add-on feature, you have to replay it in any case. Otherwise you're facing corrupted data. What simple bootloaders often do is it replaying the journal only to memory but don't write back to the MTD. Thanks, //richard
On Mon, May 29, 2017 at 12:00 PM, Richard Weinberger <richard@nod.at> wrote: > Amir, Hyunchul, > > Am 29.05.2017 um 07:40 schrieb Amir Goldstein: >> On Mon, May 29, 2017 at 7:40 AM, Hyunchul Lee <hyc.lee@gmail.com> wrote: >>> >>> and I missed the following case. >>> >>> in some embedded systems, clean-up for shutdown should be fast. >>> during this clean-up, freeze file system to guarantee integrity. >>> umount with MNT_DETACH is not suitable because of not killing tasks. > > more details please, UBIFS is designed to survive a power-cut/reboot/etc. > at any time. How would it corrupt? > >> Interesting point. It seems that good old "sync; reboot" does not cut >> it anymore. >> Not even emergency remount read-only sysrq trigger. >> >> Some of you may have been following this thread on fsdevel: >> https://www.spinics.net/lists/linux-xfs/msg06953.html >> >> Probably less have been following this longer thread on xfs list: >> https://www.spinics.net/lists/linux-xfs/msg06883.html > > Well, UBIFS is a bit different. > The UBIFS journal is not an add-on feature, you have to replay it in > any case. Otherwise you're facing corrupted data. Yes, I suppose you are right. I guess there is no equivalent of mount -oro,{norecovery,noload} for ubifs. I don't know the ubifs journal implementation details. Can ubifs_run_commit() when writers are frozen contribute to shorter journal replay time after boot with some workloads? Amir.
Amir, Am 29.05.2017 um 12:04 schrieb Amir Goldstein: >> Well, UBIFS is a bit different. >> The UBIFS journal is not an add-on feature, you have to replay it in >> any case. Otherwise you're facing corrupted data. > > Yes, I suppose you are right. > I guess there is no equivalent of mount -oro,{norecovery,noload} for > ubifs. Correct. > I don't know the ubifs journal implementation details. > Can ubifs_run_commit() when writers are frozen contribute to > shorter journal replay time after boot with some workloads? If the journal is empty then mounting will be faster, yes. But I'm still interested in Hyunchul's use-case/problem. Usually you run UBIFS in an embedded environment where you simple never shutdown or reboot in a clean way. The power supply just cut off. On the other hand, if you want an empty journal for a faster mount, just make sure that you umount upon shutdown. Or make the size of the journal smaller at mkfs time. Also see: commit 27ad27993313312a4ad0047d0a944c425cd511a5 Author: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Date: Thu Jan 29 16:34:30 2009 +0200 UBIFS: remove fast unmounting This UBIFS feature has never worked properly, and it was a mistake to add it because we simply have no use-cases. So, lets still accept the fast_unmount mount option, but ignore it. This does not change much, because UBIFS commit in sync_fs anyway, and sync_fs is called while unmounting. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Thanks, //richard
On Mon, May 29, 2017 at 1:17 PM, Richard Weinberger <richard@nod.at> wrote: > Amir, > > Am 29.05.2017 um 12:04 schrieb Amir Goldstein: >>> Well, UBIFS is a bit different. >>> The UBIFS journal is not an add-on feature, you have to replay it in >>> any case. Otherwise you're facing corrupted data. >> >> Yes, I suppose you are right. >> I guess there is no equivalent of mount -oro,{norecovery,noload} for >> ubifs. > > Correct. > >> I don't know the ubifs journal implementation details. >> Can ubifs_run_commit() when writers are frozen contribute to >> shorter journal replay time after boot with some workloads? > > If the journal is empty then mounting will be faster, yes. > But I'm still interested in Hyunchul's use-case/problem. Makes sense. > Usually you run UBIFS in an embedded environment where you simple > never shutdown or reboot in a clean way. The power supply just > cut off. > > On the other hand, if you want an empty journal for a faster mount, > just make sure that you umount upon shutdown. So you see, unmount upon shutdown is not always an option when you are in a system where not all tasks are killed before attempting to unmount (or even attempting to remounr,ro). This is what Darrick's patch was all about. Maybe there is no such embedded system... With the numbers of different embedded systems going to infinity, the probability of no "such embedded system" is unlikely. FIFREEZE gives a privileged process the ability to checkpoint the journal (e.g. for shorter mount time) and reboot without having to kill all other processes in the system first. Amir.
On Mon, May 29, 2017 at 10:42:37AM +0200, Richard Weinberger wrote: > Hyunchul, > > Am 29.05.2017 um 04:24 schrieb Hyunchul Lee: > >>> This is just broken. First ubifs should still properly propagate > >>> the errors, and second freezing/unfreezing read only file systems is > >>> perfectly valid, > >> > >> it is right. > > > > if updating TNC is failed, ubifs might become inconsistant and be switched to > > read-only mode. for example, when ubifs_jnl_update is called to create a file, > > if inserting a znode for new inode is failed, TNC has only a znode for > > new dentry. and this can be only recoverd by replay. > > > > is it required to fix this? > > UBIFS is designed to be power-cut tolerant. > So, UBIFS must not corrupt in any case. > > Which failure are you facing? > > I have the feeling that you try to paper over some other issue. :-) The failure hasn't happened. I wondered the following situation should be handled. ubifs_create ubifs_jnl_update write_head ubifs_tnc_add_nm /* (1) add dentry to TNC */ ubifs_tnc_add /* (2) add new inode to TNC */ ubifs_tnc_add /* (3) add parent inode to TNC */ If ubifs_tnc_add(2) fails, TNC would have the index of a dentry which points to an invalid inode. So, though ubifs_readdir emits the dentry, this inode cannot be accessed. Becasue there isn't the index of the inode. I know this situation is hardly probable. But UBIFS would be read-only and inconsitant in this situation, until replay is completed. > > Thanks, > //richard
Hyunchul, Am 30.05.2017 um 04:37 schrieb Hyunchul Lee: >> UBIFS is designed to be power-cut tolerant. >> So, UBIFS must not corrupt in any case. >> >> Which failure are you facing? >> >> I have the feeling that you try to paper over some other issue. :-) > > The failure hasn't happened. I wondered the following situation > should be handled. > > ubifs_create > ubifs_jnl_update > write_head > ubifs_tnc_add_nm /* (1) add dentry to TNC */ > ubifs_tnc_add /* (2) add new inode to TNC */ > ubifs_tnc_add /* (3) add parent inode to TNC */ > > If ubifs_tnc_add(2) fails, TNC would have the index of a dentry > which points to an invalid inode. So, though ubifs_readdir > emits the dentry, this inode cannot be accessed. Becasue > there isn't the index of the inode. Well, to make ubifs_jnl_update() more robust wrt. such unlikely failures please rework the journal code. Adding freeze support does not fix the root cause. UBIFS treats unrecoverable errors in ubifs_jnl_* since ever as fatal. Thanks, //richard
diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c index 63f5661..ab347ff 100644 --- a/fs/ubifs/commit.c +++ b/fs/ubifs/commit.c @@ -49,7 +49,7 @@ #include "ubifs.h" /* - * nothing_to_commit - check if there is nothing to commit. + * ubifs_nothing_to_commit - check if there is nothing to commit. * @c: UBIFS file-system description object * * This is a helper function which checks if there is anything to commit. It is @@ -65,7 +65,7 @@ * * This function returns %1 if there is nothing to commit and %0 otherwise. */ -static int nothing_to_commit(struct ubifs_info *c) +int ubifs_nothing_to_commit(struct ubifs_info *c) { /* * During mounting or remounting from R/O mode to R/W mode we may @@ -120,7 +120,7 @@ static int do_commit(struct ubifs_info *c) goto out_up; } - if (nothing_to_commit(c)) { + if (ubifs_nothing_to_commit(c)) { up_write(&c->commit_sem); err = 0; goto out_cancel; diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c index b73811b..16fc22c 100644 --- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -356,6 +356,7 @@ static void ubifs_evict_inode(struct inode *inode) if (inode->i_nlink) goto done; + sb_start_intwrite(inode->i_sb); if (is_bad_inode(inode)) goto out; @@ -377,6 +378,7 @@ static void ubifs_evict_inode(struct inode *inode) c->bi.nospace = c->bi.nospace_rp = 0; smp_wmb(); } + sb_end_intwrite(inode->i_sb); done: clear_inode(inode); #ifdef CONFIG_UBIFS_FS_ENCRYPTION @@ -486,6 +488,64 @@ static int ubifs_sync_fs(struct super_block *sb, int wait) return ubi_sync(c->vi.ubi_num); } +static int ubifs_freeze_super(struct super_block *sb) +{ + struct ubifs_info *c = sb->s_fs_info; + int err; + + dbg_gen("starting"); + /* freeze_super always succeeds if file system is in read-only. + * however if there are errors, UBIFS is switched to read-only mode. + * so @ro_error should be checked. + */ + err = freeze_super(sb); + if (!err && c->ro_error) { + thaw_super(sb); + return -EIO; + } + return err; +} + +static int ubifs_freeze(struct super_block *sb) +{ + struct ubifs_info *c = sb->s_fs_info; + int ret; + + if (c->ro_error) + return -EIO; + + if (c->ro_mount) + return 0; + + down_write(&c->commit_sem); + ret = ubifs_nothing_to_commit(c); + up_write(&c->commit_sem); + + /* writes were blocked and ubifs_sync_fs was called before. + * but TNC/LPT isn't guarranteed to be clean. because if commit was + * already started before writes were blocked, TNC/LPT might have + * COW nodes. so we try to commit again in this case. + */ + if (!ret) { + ret = ubifs_run_commit(c); + if (ret) + return ret; + + down_write(&c->commit_sem); + ret = ubifs_nothing_to_commit(c); + up_write(&c->commit_sem); + if (!ret) + return -EINVAL; + } + + return 0; +} + +static int ubifs_unfreeze(struct super_block *sb) +{ + return 0; +} + /** * init_constants_early - initialize UBIFS constants. * @c: UBIFS file-system description object @@ -1889,6 +1949,9 @@ static int ubifs_remount_fs(struct super_block *sb, int *flags, char *data) .remount_fs = ubifs_remount_fs, .show_options = ubifs_show_options, .sync_fs = ubifs_sync_fs, + .freeze_super = ubifs_freeze_super, + .freeze_fs = ubifs_freeze, + .unfreeze_fs = ubifs_unfreeze, }; /** diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h index abdd116..545796e 100644 --- a/fs/ubifs/ubifs.h +++ b/fs/ubifs/ubifs.h @@ -1645,6 +1645,7 @@ unsigned long ubifs_shrink_count(struct shrinker *shrink, void ubifs_recovery_commit(struct ubifs_info *c); int ubifs_gc_should_commit(struct ubifs_info *c); void ubifs_wait_for_commit(struct ubifs_info *c); +int ubifs_nothing_to_commit(struct ubifs_info *c); /* master.c */ int ubifs_read_master(struct ubifs_info *c);