diff mbox

[03/11] fs: add frozen sb state helpers

Message ID 20171201211327.GQ729@wotan.suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Luis Chamberlain Dec. 1, 2017, 9:13 p.m. UTC
On Fri, Dec 01, 2017 at 12:47:24PM +0100, Jan Kara wrote:
> On Thu 30-11-17 20:05:48, Luis R. Rodriguez wrote:
> > On Thu, Nov 30, 2017 at 06:13:10PM +0100, Jan Kara wrote:
> > > ... I dislike the _by_user() suffix as there may be different places that
> > > call freeze_super() (e.g. device mapper does this during some operations).
> > > Clearly we need to distinguish "by system suspend" and "the other" cases.
> > > So please make this clear in the naming.
> > 
> > Ah. How about sb_frozen_by_cb() ?
> 
> And what does 'cb' stand for? :)

Callback. But let me think about bdev usage a bit and we can worry about the
bikeshedding later.

> > > In fact, what might be a cleaner solution is to introduce a 'freeze_count'
> > > for superblock freezing (we already do have this for block devices). Then
> > > you don't need to differentiate these two cases - but you'd still need to
> > > properly handle cleanup if freezing of all superblocks fails in the middle.
> > > So I'm not 100% this works out nicely in the end. But it's certainly worth
> > > a consideration.
> > 
> > Ah, there are three important reasons for doing it the way I did it which are
> > easy to miss, unless you read the commit log message very carefully.
> > 
> > 0) The ioctl interface causes a failure to be sent back to userspace if
> > you issue two consecutive freezes, or two thaws. Ie, once a filesystem is
> > frozen, a secondary call will result in an error. Likewise for thaw.
> 
> Yep. But also note that there's *another* interface to filesystem freezing
> which behaves differently - freeze_bdev() (used internally by dm). That
> interface uses the counter and freezing of already frozen device succeeds.

Ah... so freeze_bdev() semantics matches the same semantics I was looking
for.

> IOW it is a mess.

To say the least.

> We cannot change the behavior of the ioctl but we could
> still provide an in-kernel interface to freeze_super() with the same
> semantics as freeze_bdev() which might be easier to use by suspend - maybe
> we could call it 'exclusive' (for the current freeze_super() semantics) and
> 'non-exclusive' (for the freeze_bdev() semantics) since this is very much
> like O_EXCL open of block devices...

Sure, now typically I see we make exclusive calls with the postfix _excl() so
I take it you'd be OK in renaming freeze_super() freeze_super_excl() eventually
then?

I totally missed freeze_bdev() otherwise I think I would have picked up on the
shared semantics stuff and I would have just made a helper out of what
freeze_bdev() uses, and then have both in-kernel and freeze_bdev() use it.

I'll note that its still not perfectly clear if really the semantics behind
freeze_bdev() match what I described above fully. That still needs to be
vetted for. For instance, does thaw_bdev() keep a superblock frozen if we
an ioctl initiated freeze had occurred before? If so then great. Otherwise
I think we'll need to distinguish the ioctl interface. Worst possible case
is that bdev semantics and in-kernel semantics differ somehow, then that
will really create a holy fucking mess.

> > 1) The new iterate supers stuff I added bail on the first error and return that
> > error. If we kept the ioctl() interface error scheme we'd be erroring out
> > if on suspend if userspace had already frozen a filesystem. Clearly that'd
> > be silly so we need to distinguish between the automatic kernel freezing
> > and the old userspace ioctl initiated interface, so that we can keep the
> > old behaviour but allow in-kernel auto freeze on suspend to work properly.
> 
> This would work fine with the non-exclusive semantics I believe.

Groovy.

> > 2) If we fail to suspend we need to then thaw up all filesystems. The order
> > in which we try to freeze is in reverse order on the super_block list. If we
> > fail though we iterate in proper order on the super_block list and thaw. If
> > you had two filesystems this means that if a failure happened on freezing
> > the first filesystem, we'd first thaw the other filesystem -- and because of
> > 0) if we don't distinguish between the ioctl interface or auto freezing, we'd
> > also fail on thaw'ing given the other superblock wouldn't have been frozen.
> > 
> > So we need to keep two separate approaches. The count stuff would not suffice
> > to distinguish origin of source for freeze call.
> > 
> > Come to think of it, I think I forgot to avoid thaw if the freeze was ioctl
> > initiated..
> > 
> > thaw_unlocked(bool cb_call)
> > {
> >   if (sb_frozen_by_cb(sb) && !cb_call)
> >     return 0; /* skip as the user wanted to keep this fs frozen */
> >   ...
> > }
> > 
> > Even though the kernel auto call is new I think we need to keep ioctl initiated
> > frozen filesystems frozen to not break old userspace assumptions.
> > 
> > So, keeping all this in mind, does a count method still suffice?
> 
> The count method would need a different error recovery method - i.e. if you
> fail freezing filesystems somewhere in the middle of iteration through
> superblock list, you'd need to iterate from that point on to the superblock
> where you've started. This is somewhat more complicated than your approach
> but it seems cleaner to me:
> 
> 1) Function freezing all superblocks either succeeds and all superblocks
> are frozen or fails and no superblocks are (additionally) frozen.

To be clear, for now this would just be, all superblocks that support
freeze_fs() are frozen :)

> 2) It is not that normal users + one special user (who owns the "flag" in
> the superblock in form of a special freeze state) setup. We'd simply have
> exclusive and non-exclusive users of superblock freezing and there can be
> arbitrary numbers of them.

Sorry I did not understand this point. Can you rephrase perhaps a bit?

Anyway, I just tried implementing this and it seemed rather easy to
use a pivot, however note that then freeze_processes() which calls
fs_suspend_freeze() would somehow need to pass the failed sb... do
we want to have let fs_suspend_freeze() pass a parameter to be set
to the failed sb of it failed? Locking-wise this seems racy.

So I mean, adding support to thaw using a pivot, the failed sb is
rather easy:


But we'd still need to to give enough context to let thaw use the failed sb
as a pivot.

  Luis

Comments

Jan Kara Dec. 21, 2017, 11:03 a.m. UTC | #1
Hello,

I think I owe you a reply here... Sorry that it took so long.

On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote:
> On Fri, Dec 01, 2017 at 12:47:24PM +0100, Jan Kara wrote:
> > On Thu 30-11-17 20:05:48, Luis R. Rodriguez wrote:
> > > > In fact, what might be a cleaner solution is to introduce a 'freeze_count'
> > > > for superblock freezing (we already do have this for block devices). Then
> > > > you don't need to differentiate these two cases - but you'd still need to
> > > > properly handle cleanup if freezing of all superblocks fails in the middle.
> > > > So I'm not 100% this works out nicely in the end. But it's certainly worth
> > > > a consideration.
> > > 
> > > Ah, there are three important reasons for doing it the way I did it which are
> > > easy to miss, unless you read the commit log message very carefully.
> > > 
> > > 0) The ioctl interface causes a failure to be sent back to userspace if
> > > you issue two consecutive freezes, or two thaws. Ie, once a filesystem is
> > > frozen, a secondary call will result in an error. Likewise for thaw.
> > 
> > Yep. But also note that there's *another* interface to filesystem freezing
> > which behaves differently - freeze_bdev() (used internally by dm). That
> > interface uses the counter and freezing of already frozen device succeeds.
> 
> Ah... so freeze_bdev() semantics matches the same semantics I was looking
> for.
> 
> > IOW it is a mess.
> 
> To say the least.
> 
> > We cannot change the behavior of the ioctl but we could
> > still provide an in-kernel interface to freeze_super() with the same
> > semantics as freeze_bdev() which might be easier to use by suspend - maybe
> > we could call it 'exclusive' (for the current freeze_super() semantics) and
> > 'non-exclusive' (for the freeze_bdev() semantics) since this is very much
> > like O_EXCL open of block devices...
> 
> Sure, now typically I see we make exclusive calls with the postfix _excl() so
> I take it you'd be OK in renaming freeze_super() freeze_super_excl() eventually
> then?

In principle yes but let's leave the naming disputes to a later time when
it is clear what API do we actually want to provide.

> I totally missed freeze_bdev() otherwise I think I would have picked up on the
> shared semantics stuff and I would have just made a helper out of what
> freeze_bdev() uses, and then have both in-kernel and freeze_bdev() use it.
> 
> I'll note that its still not perfectly clear if really the semantics behind
> freeze_bdev() match what I described above fully. That still needs to be
> vetted for. For instance, does thaw_bdev() keep a superblock frozen if we
> an ioctl initiated freeze had occurred before? If so then great. Otherwise
> I think we'll need to distinguish the ioctl interface. Worst possible case
> is that bdev semantics and in-kernel semantics differ somehow, then that
> will really create a holy fucking mess.

I believe nobody really thought about mixing those two interfaces to fs
freezing and so the behavior is basically defined by the implementation.
That is:

freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY
freeze_bdev() on sb frozen by freeze_bdev() -> success
ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY
ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY

thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL
ioctl_fsthaw() on sb frozen by freeze_bdev() -> success

What I propose is the following API:

freeze_super_excl()
  - freezes superblock, returns EBUSY if the superblock is already frozen
    (either by another freeze_super_excl() or by freeze_super())
freeze_super()
  - this function will make sure superblock is frozen when the function
    returns with success. It can be nested with other freeze_super() or
    freeze_super_excl() calls (this second part is different from how
    freeze_bdev() behaves currently but AFAICT this behavior is actually
    what all current users of freeze_bdev() really want - just make sure
    fs cannot be written to)
thaw_super()
  - counterpart to freeze_super(), would fail with EINVAL if we were to
    drop the last "freeze reference" but sb was actually frozen with
    freeze_super_excl()
thaw_super_excl()
  - counterpart to freeze_super_excl(). Fails with EINVAL if sb was not
    frozen with freeze_super_excl() (this is different to current behavior
    but I don't believe anyone relies on this and doing otherwise is asking
    for data corruption).

I'd implement it by a freeze counter in the superblock (similar to what we
currently have in bdev) where every call to freeze_super() or
freeze_super_excl() would add one. Additionally we'd have a flag in the
superblock whether the first freeze (it could not be any other since those
would fail with EBUSY) came from freeze_super_excl().

Then we could make ioctl interface use the _excl() variant of the freezing
API, freeze_bdev() would use the non-exclusive variant (we could drop the
freeze counter in bdev completely), your freezing on suspend could then use
the non-exclusive variant as well.

Also when doing this, you'd need to move code like:

        if (sb->s_op->freeze_super)
                error = sb->s_op->freeze_super(sb);
        else
                error = freeze_super(sb);

into the freeze_super() / freeze_super_excl() handler behind the
freeze counting code which might be a bit tricky WRT locking. GFS2 is the
only fs having ->freeze_super() and that callback was implemented specifically
so that it can do its own (cluster wide) locking before generic code grabbing
s_umount semaphore. Then internally GFS2 ends up calling freeze_super()
from freeze_go_sync() when cluster lock is acquired.


> > 2) It is not that normal users + one special user (who owns the "flag" in
> > the superblock in form of a special freeze state) setup. We'd simply have
> > exclusive and non-exclusive users of superblock freezing and there can be
> > arbitrary numbers of them.
> 
> Sorry I did not understand this point. Can you rephrase perhaps a bit?
> 
> Anyway, I just tried implementing this and it seemed rather easy to
> use a pivot, however note that then freeze_processes() which calls
> fs_suspend_freeze() would somehow need to pass the failed sb... do
> we want to have let fs_suspend_freeze() pass a parameter to be set
> to the failed sb of it failed? Locking-wise this seems racy.

So with your iterate_supers_excl() doing this is somewhat difficult but you
could have something like:

int freeze_all_supers(void)
{
	struct super_block *sb, *p = NULL;
	int error = 0;

	spin_lock(&sb_lock);
	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
		if (hlist_unhashed(&sb->s_instances))
			continue;
		sb->s_count++;
		spin_unlock(&sb_lock);

		down_write(&sb->s_umount);
		if (sb->s_root && (sb->s_flags & SB_BORN)) {
			error = freeze_super(sb, arg);
			if (error) {
				up_write(&sb->s_umount);
				spin_lock(&sb_lock);
				if (p)
					__put_super(p);
				p = sb;
				list_for_each_entry_continue(sb, &super_blocks,
							     s_list) {
					if (hlist_unhashed(&sb->s_instances))
						continue;
					sb->s_count++;
					spin_unlock(&sb_lock);

					down_write(&sb->s_umount);
					if (sb->s_root && (sb->s_flags & SB_BORN))
						thaw_super(sb, arg);
					up_write(&sb->s_umount);

					spin_lock(&sb_lock);
					if (p)
						__put_super(p);
					p = sb;
				}
				break;
			}
		}
		up_write(&sb->s_umount);

		spin_lock(&sb_lock);
		if (p)
			__put_super(p);
		p = sb;
	}
	if (p)
		__put_super(p);
	spin_unlock(&sb_lock);

	return error;
}

And you could possibly factor that out into two helper functions for
iterating the superblocks, just they'd need more parameters and you'd need
to pass reference (sb->count) when passing in the 'pivot' as you call it.

								Honza
Luis Chamberlain April 18, 2018, 12:59 a.m. UTC | #2
On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote:
> Hello,
> 
> I think I owe you a reply here... Sorry that it took so long.

Took me just as long :)

> On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote:
> > 
> > I'll note that its still not perfectly clear if really the semantics behind
> > freeze_bdev() match what I described above fully. That still needs to be
> > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we
> > an ioctl initiated freeze had occurred before? If so then great. Otherwise
> > I think we'll need to distinguish the ioctl interface. Worst possible case
> > is that bdev semantics and in-kernel semantics differ somehow, then that
> > will really create a holy fucking mess.
> 
> I believe nobody really thought about mixing those two interfaces to fs
> freezing and so the behavior is basically defined by the implementation.
> That is:
> 
> freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY

Note below as well on your *future* freeze_super() implementation.

> freeze_bdev() on sb frozen by freeze_bdev() -> success
> ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY
> ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY
> 
> thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL

Phew, so this is what we want for the in-kernel freezing so we're good
and *can* combine these then.

> ioctl_fsthaw() on sb frozen by freeze_bdev() -> success
> 
> What I propose is the following API:
> 
> freeze_super_excl()
>   - freezes superblock, returns EBUSY if the superblock is already frozen
>     (either by another freeze_super_excl() or by freeze_super())
> freeze_super()
>   - this function will make sure superblock is frozen when the function
>     returns with success. 

That's straight forward.

>     It can be nested with other freeze_super() or
>     freeze_super_excl() calls 

This is where it can get hairy. More below.

>     (this second part is different from how
>     freeze_bdev() behaves currently but AFAICT this behavior is actually
>     what all current users of freeze_bdev() really want - just make sure
>     fs cannot be written to)

If we can agree to this, then sure. However there are two types of
possible nested calls to consider, one where the sb was already frozen
by an IOCTL, and the other where it was initiated by either another
freeze_super_excl() or another freeze_super() call which is currently
being processed. For the first type, its easy to say the device is
already frozen as such return success. If the freezing is ongoing,
we may want to wait or not wait, and this will depend on our current
use cases for freeze_bdev().

As you noted above, freeze_bdev() currently returns EBUSY if we had
the sb already frozen by ioctl_fsfreeze(). It may be a welcomed
enhancement to correct the semantics first to address the first case,
but keep the EBUSY for the other case. A secondary patch could then
add a completion mechanism and let callers decide to either wait or not.
*Iff* the caller did not opt-in to wait we keep the EBUSY return.

Seem reasonable?

I'll address the rest of the mail later.

  Luis
Jan Kara April 18, 2018, 10:12 a.m. UTC | #3
On Tue 17-04-18 17:59:36, Luis R. Rodriguez wrote:
> On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote:
> > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote:
> > > 
> > > I'll note that its still not perfectly clear if really the semantics behind
> > > freeze_bdev() match what I described above fully. That still needs to be
> > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we
> > > an ioctl initiated freeze had occurred before? If so then great. Otherwise
> > > I think we'll need to distinguish the ioctl interface. Worst possible case
> > > is that bdev semantics and in-kernel semantics differ somehow, then that
> > > will really create a holy fucking mess.
> > 
> > I believe nobody really thought about mixing those two interfaces to fs
> > freezing and so the behavior is basically defined by the implementation.
> > That is:
> > 
> > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY
> 
> Note below as well on your *future* freeze_super() implementation.
> 
> > freeze_bdev() on sb frozen by freeze_bdev() -> success
> > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY
> > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY
> > 
> > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL
> 
> Phew, so this is what we want for the in-kernel freezing so we're good
> and *can* combine these then.
> 
> > ioctl_fsthaw() on sb frozen by freeze_bdev() -> success
> > 
> > What I propose is the following API:
> > 
> > freeze_super_excl()
> >   - freezes superblock, returns EBUSY if the superblock is already frozen
> >     (either by another freeze_super_excl() or by freeze_super())
> > freeze_super()
> >   - this function will make sure superblock is frozen when the function
> >     returns with success. 
> 
> That's straight forward.
> 
> >     It can be nested with other freeze_super() or
> >     freeze_super_excl() calls 
> 
> This is where it can get hairy. More below.
> 
> >     (this second part is different from how
> >     freeze_bdev() behaves currently but AFAICT this behavior is actually
> >     what all current users of freeze_bdev() really want - just make sure
> >     fs cannot be written to)
> 
> If we can agree to this, then sure. However there are two types of
> possible nested calls to consider, one where the sb was already frozen
> by an IOCTL, and the other where it was initiated by either another
> freeze_super_excl() or another freeze_super() call which is currently
> being processed. For the first type, its easy to say the device is
> already frozen as such return success. If the freezing is ongoing,
> we may want to wait or not wait, and this will depend on our current
> use cases for freeze_bdev().

A side note since I'm not sure I wrote this down in my previous email:
I want ioctl_fsfreeze() directly use freeze_super_excl().

Now to your freeze in progress question: freeze_super_excl() can
immediately return EBUSY when there's freezing in progress. OTOH
freeze_super() always has to wait for the current freeze / thaw to finish
and then do what's necessary. I don't see a use case where you'd like to
have freeze_super() not wait.

> As you noted above, freeze_bdev() currently returns EBUSY if we had
> the sb already frozen by ioctl_fsfreeze(). It may be a welcomed
> enhancement to correct the semantics first to address the first case,
> but keep the EBUSY for the other case. A secondary patch could then
> add a completion mechanism and let callers decide to either wait or not.
> *Iff* the caller did not opt-in to wait we keep the EBUSY return.

You're now speaking about steps to transition to the new API, right? I'd
structure the transition as follows:

1) Move bdev->bd_fsfreeze_count to a superblock.
2) Make freeze_super() grab the counter as well, thaw_super() drops it and
  unfreezes the filesystem only if the counter dropped to zero.
3) Rename freeze_super() to freeze_super_excl().
4) Only now I'd go for messing with freeze_bdev() as it now combines sanely
with freeze_super_excl(). Probably I'd just implement new freeze_super()
with the desired semantics (including waiting for ongoing operation to
finish).
5) And then switch all users (there are 4 in the kernel) from freeze_bdev()
to freeze_super() with the justification in each case why the new semantics
is actually desirable.
6) Drop old freeze_bdev() - note that only one freeze_bdev() user (in
drivers/md/dm.c) is actually interested in passing bdev, all the others are
better off just passing in superblock to new freeze_super(). Anyway for
that user in dm we might still provide a convenience wrapper to grab the
superblock and call new freeze_super() on it.

								Honza
Luis Chamberlain April 20, 2018, 6:49 p.m. UTC | #4
On Tue, Apr 17, 2018 at 05:59:36PM -0700, Luis R. Rodriguez wrote:
> On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote:
> > Hello,
> > 
> > I think I owe you a reply here... Sorry that it took so long.
> 
> Took me just as long :)
> 
> > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote:
> > > 
> > > I'll note that its still not perfectly clear if really the semantics behind
> > > freeze_bdev() match what I described above fully. That still needs to be
> > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we
> > > an ioctl initiated freeze had occurred before? If so then great. Otherwise
> > > I think we'll need to distinguish the ioctl interface. Worst possible case
> > > is that bdev semantics and in-kernel semantics differ somehow, then that
> > > will really create a holy fucking mess.
> > 
> > I believe nobody really thought about mixing those two interfaces to fs
> > freezing and so the behavior is basically defined by the implementation.
> > That is:
> > 
> > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY
> > freeze_bdev() on sb frozen by freeze_bdev() -> success
> > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY
> > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY
> > 
> > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL
> 
> Phew, so this is what we want for the in-kernel freezing so we're good
> and *can* combine these then.

I double checked, and I don't see where you get EINVAL for this case.
We *do* keep the sb frozen though, which is good, and the worst fear
I had was that we did not. However we return 0 if there was already
a prior freeze_bdev() or ioctl_fsfreeze() other than the context that
started the prior freeze (--bdev->bd_fsfreeze_count > 0).

The -EINVAL is only returned currently if there were no freezers.

int thaw_bdev(struct block_device *bdev, struct super_block *sb)
{
	int error = -EINVAL;

	mutex_lock(&bdev->bd_fsfreeze_mutex);
	if (!bdev->bd_fsfreeze_count)
		goto out;

	error = 0;
	if (--bdev->bd_fsfreeze_count > 0)
		goto out;
	...
out:
	mutex_unlock(&bdev->bd_fsfreeze_mutex);
	return error;
}

  Luis
Jan Kara April 21, 2018, 11:53 p.m. UTC | #5
On Fri 20-04-18 11:49:32, Luis R. Rodriguez wrote:
> On Tue, Apr 17, 2018 at 05:59:36PM -0700, Luis R. Rodriguez wrote:
> > On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote:
> > > Hello,
> > > 
> > > I think I owe you a reply here... Sorry that it took so long.
> > 
> > Took me just as long :)
> > 
> > > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote:
> > > > 
> > > > I'll note that its still not perfectly clear if really the semantics behind
> > > > freeze_bdev() match what I described above fully. That still needs to be
> > > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we
> > > > an ioctl initiated freeze had occurred before? If so then great. Otherwise
> > > > I think we'll need to distinguish the ioctl interface. Worst possible case
> > > > is that bdev semantics and in-kernel semantics differ somehow, then that
> > > > will really create a holy fucking mess.
> > > 
> > > I believe nobody really thought about mixing those two interfaces to fs
> > > freezing and so the behavior is basically defined by the implementation.
> > > That is:
> > > 
> > > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY
> > > freeze_bdev() on sb frozen by freeze_bdev() -> success
> > > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY
> > > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY
> > > 
> > > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL
> > 
> > Phew, so this is what we want for the in-kernel freezing so we're good
> > and *can* combine these then.
> 
> I double checked, and I don't see where you get EINVAL for this case.
> We *do* keep the sb frozen though, which is good, and the worst fear
> I had was that we did not. However we return 0 if there was already
> a prior freeze_bdev() or ioctl_fsfreeze() other than the context that
> started the prior freeze (--bdev->bd_fsfreeze_count > 0).
> 
> The -EINVAL is only returned currently if there were no freezers.
> 
> int thaw_bdev(struct block_device *bdev, struct super_block *sb)
> {
> 	int error = -EINVAL;
> 
> 	mutex_lock(&bdev->bd_fsfreeze_mutex);
> 	if (!bdev->bd_fsfreeze_count)
> 		goto out;

But this is precisely where we'd bail if we freeze sb by ioctl_fsfreeze()
but try to thaw by thaw_bdev(). ioctl_fsfreeze() does not touch
bd_fsfreeze_count...

								Honza
Luis Chamberlain April 22, 2018, 1:22 a.m. UTC | #6
On Sun, Apr 22, 2018 at 01:53:23AM +0200, Jan Kara wrote:
> On Fri 20-04-18 11:49:32, Luis R. Rodriguez wrote:
> > On Tue, Apr 17, 2018 at 05:59:36PM -0700, Luis R. Rodriguez wrote:
> > > On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote:
> > > > Hello,
> > > > 
> > > > I think I owe you a reply here... Sorry that it took so long.
> > > 
> > > Took me just as long :)
> > > 
> > > > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote:
> > > > > 
> > > > > I'll note that its still not perfectly clear if really the semantics behind
> > > > > freeze_bdev() match what I described above fully. That still needs to be
> > > > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we
> > > > > an ioctl initiated freeze had occurred before? If so then great. Otherwise
> > > > > I think we'll need to distinguish the ioctl interface. Worst possible case
> > > > > is that bdev semantics and in-kernel semantics differ somehow, then that
> > > > > will really create a holy fucking mess.
> > > > 
> > > > I believe nobody really thought about mixing those two interfaces to fs
> > > > freezing and so the behavior is basically defined by the implementation.
> > > > That is:
> > > > 
> > > > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY
> > > > freeze_bdev() on sb frozen by freeze_bdev() -> success
> > > > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY
> > > > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY
> > > > 
> > > > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL
> > > 
> > > Phew, so this is what we want for the in-kernel freezing so we're good
> > > and *can* combine these then.
> > 
> > I double checked, and I don't see where you get EINVAL for this case.
> > We *do* keep the sb frozen though, which is good, and the worst fear
> > I had was that we did not. However we return 0 if there was already
> > a prior freeze_bdev() or ioctl_fsfreeze() other than the context that
> > started the prior freeze (--bdev->bd_fsfreeze_count > 0).
> > 
> > The -EINVAL is only returned currently if there were no freezers.
> > 
> > int thaw_bdev(struct block_device *bdev, struct super_block *sb)
> > {
> > 	int error = -EINVAL;
> > 
> > 	mutex_lock(&bdev->bd_fsfreeze_mutex);
> > 	if (!bdev->bd_fsfreeze_count)
> > 		goto out;
> 
> But this is precisely where we'd bail if we freeze sb by ioctl_fsfreeze()
> but try to thaw by thaw_bdev(). ioctl_fsfreeze() does not touch
> bd_fsfreeze_count...

Ah, yes, I see that now, thanks!

  Luis
diff mbox

Patch

diff --git a/fs/super.c b/fs/super.c
index 885711c1d35b..8cb6f38652d8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -614,13 +614,21 @@  void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
  *	locked superblock and given argument. Returns 0 unless an error
  *	occurred on calling the function on any superblock.
  */
-int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg)
+int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg,
+			struct super_block *pivot)
 {
 	struct super_block *sb, *p = NULL;
 	int error = 0;
 
 	spin_lock(&sb_lock);
 	list_for_each_entry(sb, &super_blocks, s_list) {
+		/* If we have a pivot, start work on the next item */
+		if (pivot) {
+			if (sb != pivot)
+				continue;
+			pivot = NULL;
+			continue;
+		}
 		if (hlist_unhashed(&sb->s_instances))
 			continue;
 		sb->s_count++;