diff mbox

[RFC,v3,1/5] Revert "btrfs: add support for processing pending changes" related commits

Message ID 1422005505-9472-2-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive)
State Superseded
Headers show

Commit Message

Qu Wenruo Jan. 23, 2015, 9:31 a.m. UTC
This reverts commit 572d9ab7845 ~ a6f69dc8018.

This pending commits patches introduce deadlock with freeze, and fix for
it will introduce extra checks on freeze and read only case.

For mount option change, later patches will introduce copy-n-update
method and rwsem protects to keep mount options consistent during
transaction.

For sysfs interface to change label/features, it will keep the same
behavior as 'btrfs pro set', so pending changes are also not needed.

Revert them to a clean base for later changes.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Conflicts:
	fs/btrfs/ctree.h
	fs/btrfs/super.c
---
 fs/btrfs/ctree.h       | 49 +------------------------------------------------
 fs/btrfs/disk-io.c     |  8 +++-----
 fs/btrfs/inode-map.c   |  2 +-
 fs/btrfs/super.c       | 14 +++-----------
 fs/btrfs/sysfs.c       | 34 +++++++++++++++++++++-------------
 fs/btrfs/transaction.c | 38 ++++++--------------------------------
 fs/btrfs/transaction.h |  2 --
 7 files changed, 35 insertions(+), 112 deletions(-)

Comments

David Sterba Jan. 23, 2015, 2:57 p.m. UTC | #1
On Fri, Jan 23, 2015 at 05:31:41PM +0800, Qu Wenruo wrote:
> For mount option change, later patches will introduce copy-n-update
> method and rwsem protects to keep mount options consistent during
> transaction.

That's a better approach, for the mount options.

> For sysfs interface to change label/features, it will keep the same
> behavior as 'btrfs pro set', so pending changes are also not needed.

This still leaves the transaction commit inside the syfs handler, that
was one of the points not to do that.

The callstack looks safe from, eg. the label handler:

[169148.523158] WARNING: CPU: 1 PID: 2044 at fs/btrfs/sysfs.c:394 btrfs_label_store+0x135/0x190 [btrfs]()
[169148.533925] Modules linked in: btrfs dm_flakey rpcsec_gss_krb5 loop [last unloaded: btrfs]
[169148.536950] CPU: 1 PID: 2044 Comm: bash Tainted: G        W      3.19.0-rc5-default+ #211
[169148.536952] Hardware name: Intel Corporation Santa Rosa platform/Matanzas, BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06
[169148.536954]  000000000000018a ffff88007a753dc8 ffffffff81a9898b 000000000000018a
[169148.536963]  0000000000000000 ffff88007a753e08 ffffffff81077f65 ffff880077fb0100
[169148.536972]  ffff880075dc0000 ffff880077fbff00 0000000000000009 ffff880075dc06d0
[169148.536980] Call Trace:
[169148.536983]  [<ffffffff81a9898b>] dump_stack+0x4f/0x6c
[169148.536991]  [<ffffffff81077f65>] warn_slowpath_common+0x95/0xe0
[169148.537000]  [<ffffffff81077fca>] warn_slowpath_null+0x1a/0x20
[169148.537005]  [<ffffffffa0052b65>] btrfs_label_store+0x135/0x190 [btrfs]
[169148.537030]  [<ffffffff813ed8b7>] kobj_attr_store+0x17/0x20
[169148.537037]  [<ffffffff812147ff>] sysfs_kf_write+0x4f/0x70
[169148.537044]  [<ffffffff81213cc8>] kernfs_fop_write+0x128/0x180
[169148.537051]  [<ffffffff8119f404>] vfs_write+0xd4/0x1d0
[169148.537059]  [<ffffffff8119f7b9>] SyS_write+0x59/0xd0
[169148.537070]  [<ffffffff81a9f9d2>] system_call_fastpath+0x12/0x17

Lockep shows these locks held:

[169148.537296] 4 locks held by bash/2044:
[169148.537309]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8119f4e0>] vfs_write+0x1b0/0x1d0
[169148.537319]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff81213c2e>] kernfs_fop_write+0x8e/0x180
[169148.537330]  #2:  (s_active#214){.+.+.+}, at: [<ffffffff81213c36>] kernfs_fop_write+0x96/0x180
[169148.537342]  #3:  (tasklist_lock){.+.+..}, at: [<ffffffff810b9ed4>] debug_show_all_locks+0x44/0x1e0

#3 is from lockdep
#2 is not really a lock, annotated vfs atomic counter
#0 is annotated atomic, the freezing barrier

#1 is a kernfs mutex that, afaics it's per file, but I don't like to see
the lock dependency here. That's a lock we can see now, but it's outside
of btrfs or the vfs. It's a matter of precaution.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 26, 2015, 12:37 a.m. UTC | #2
-------- Original Message --------
Subject: Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for 
processing pending changes" related commits
From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2015?01?23? 22:57
> On Fri, Jan 23, 2015 at 05:31:41PM +0800, Qu Wenruo wrote:
>> For mount option change, later patches will introduce copy-n-update
>> method and rwsem protects to keep mount options consistent during
>> transaction.
> That's a better approach, for the mount options.
I'm glad that you like this method.
Although the description in this patch is outdated, it is now 
per-transaction mount option.
Sorry for the confusion.
>
>> For sysfs interface to change label/features, it will keep the same
>> behavior as 'btrfs pro set', so pending changes are also not needed.
> This still leaves the transaction commit inside the syfs handler, that
> was one of the points not to do that.
>
> The callstack looks safe from, eg. the label handler:
>
> [169148.523158] WARNING: CPU: 1 PID: 2044 at fs/btrfs/sysfs.c:394 btrfs_label_store+0x135/0x190 [btrfs]()
> [169148.533925] Modules linked in: btrfs dm_flakey rpcsec_gss_krb5 loop [last unloaded: btrfs]
> [169148.536950] CPU: 1 PID: 2044 Comm: bash Tainted: G        W      3.19.0-rc5-default+ #211
> [169148.536952] Hardware name: Intel Corporation Santa Rosa platform/Matanzas, BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06
> [169148.536954]  000000000000018a ffff88007a753dc8 ffffffff81a9898b 000000000000018a
> [169148.536963]  0000000000000000 ffff88007a753e08 ffffffff81077f65 ffff880077fb0100
> [169148.536972]  ffff880075dc0000 ffff880077fbff00 0000000000000009 ffff880075dc06d0
> [169148.536980] Call Trace:
> [169148.536983]  [<ffffffff81a9898b>] dump_stack+0x4f/0x6c
> [169148.536991]  [<ffffffff81077f65>] warn_slowpath_common+0x95/0xe0
> [169148.537000]  [<ffffffff81077fca>] warn_slowpath_null+0x1a/0x20
> [169148.537005]  [<ffffffffa0052b65>] btrfs_label_store+0x135/0x190 [btrfs]
> [169148.537030]  [<ffffffff813ed8b7>] kobj_attr_store+0x17/0x20
> [169148.537037]  [<ffffffff812147ff>] sysfs_kf_write+0x4f/0x70
> [169148.537044]  [<ffffffff81213cc8>] kernfs_fop_write+0x128/0x180
> [169148.537051]  [<ffffffff8119f404>] vfs_write+0xd4/0x1d0
> [169148.537059]  [<ffffffff8119f7b9>] SyS_write+0x59/0xd0
> [169148.537070]  [<ffffffff81a9f9d2>] system_call_fastpath+0x12/0x17
>
> Lockep shows these locks held:
>
> [169148.537296] 4 locks held by bash/2044:
> [169148.537309]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8119f4e0>] vfs_write+0x1b0/0x1d0
> [169148.537319]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff81213c2e>] kernfs_fop_write+0x8e/0x180
> [169148.537330]  #2:  (s_active#214){.+.+.+}, at: [<ffffffff81213c36>] kernfs_fop_write+0x96/0x180
> [169148.537342]  #3:  (tasklist_lock){.+.+..}, at: [<ffffffff810b9ed4>] debug_show_all_locks+0x44/0x1e0
>
> #3 is from lockdep
> #2 is not really a lock, annotated vfs atomic counter
> #0 is annotated atomic, the freezing barrier
>
> #1 is a kernfs mutex that, afaics it's per file, but I don't like to see
> the lock dependency here. That's a lock we can see now, but it's outside
> of btrfs or the vfs. It's a matter of precaution.
Thanks for pointing out the problem.
It makes sense to delay it.

But we have btrfs-workqueue, why not put it to "worker" workqueue?

If using this method, we can just wrap btrfs_ioctl_set_fslabel() and 
queue it to fs_info->workers.
This can avoid the the lockdep problem, but the behavior is still 
inconsistent with the synchronized
ioctl method.
Although not perfect, it should be good enough and still clean enough.

What do you think about such method?

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 26, 2015, 6:05 a.m. UTC | #3
-------- Original Message --------
Subject: Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for 
processing pending changes" related commits
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, miaoxie@huawei.com
Date: 2015?01?26? 08:37
>
> -------- Original Message --------
> Subject: Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for 
> processing pending changes" related commits
> From: David Sterba <dsterba@suse.cz>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015?01?23? 22:57
>> On Fri, Jan 23, 2015 at 05:31:41PM +0800, Qu Wenruo wrote:
>>> For mount option change, later patches will introduce copy-n-update
>>> method and rwsem protects to keep mount options consistent during
>>> transaction.
>> That's a better approach, for the mount options.
> I'm glad that you like this method.
> Although the description in this patch is outdated, it is now 
> per-transaction mount option.
> Sorry for the confusion.
>>
>>> For sysfs interface to change label/features, it will keep the same
>>> behavior as 'btrfs pro set', so pending changes are also not needed.
>> This still leaves the transaction commit inside the syfs handler, that
>> was one of the points not to do that.
>>
>> The callstack looks safe from, eg. the label handler:
>>
>> [169148.523158] WARNING: CPU: 1 PID: 2044 at fs/btrfs/sysfs.c:394 
>> btrfs_label_store+0x135/0x190 [btrfs]()
>> [169148.533925] Modules linked in: btrfs dm_flakey rpcsec_gss_krb5 
>> loop [last unloaded: btrfs]
>> [169148.536950] CPU: 1 PID: 2044 Comm: bash Tainted: G W      
>> 3.19.0-rc5-default+ #211
>> [169148.536952] Hardware name: Intel Corporation Santa Rosa 
>> platform/Matanzas, BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06
>> [169148.536954]  000000000000018a ffff88007a753dc8 ffffffff81a9898b 
>> 000000000000018a
>> [169148.536963]  0000000000000000 ffff88007a753e08 ffffffff81077f65 
>> ffff880077fb0100
>> [169148.536972]  ffff880075dc0000 ffff880077fbff00 0000000000000009 
>> ffff880075dc06d0
>> [169148.536980] Call Trace:
>> [169148.536983]  [<ffffffff81a9898b>] dump_stack+0x4f/0x6c
>> [169148.536991]  [<ffffffff81077f65>] warn_slowpath_common+0x95/0xe0
>> [169148.537000]  [<ffffffff81077fca>] warn_slowpath_null+0x1a/0x20
>> [169148.537005]  [<ffffffffa0052b65>] btrfs_label_store+0x135/0x190 
>> [btrfs]
>> [169148.537030]  [<ffffffff813ed8b7>] kobj_attr_store+0x17/0x20
>> [169148.537037]  [<ffffffff812147ff>] sysfs_kf_write+0x4f/0x70
>> [169148.537044]  [<ffffffff81213cc8>] kernfs_fop_write+0x128/0x180
>> [169148.537051]  [<ffffffff8119f404>] vfs_write+0xd4/0x1d0
>> [169148.537059]  [<ffffffff8119f7b9>] SyS_write+0x59/0xd0
>> [169148.537070]  [<ffffffff81a9f9d2>] system_call_fastpath+0x12/0x17
>>
>> Lockep shows these locks held:
>>
>> [169148.537296] 4 locks held by bash/2044:
>> [169148.537309]  #0:  (sb_writers#5){.+.+.+}, at: 
>> [<ffffffff8119f4e0>] vfs_write+0x1b0/0x1d0
>> [169148.537319]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff81213c2e>] 
>> kernfs_fop_write+0x8e/0x180
>> [169148.537330]  #2:  (s_active#214){.+.+.+}, at: 
>> [<ffffffff81213c36>] kernfs_fop_write+0x96/0x180
>> [169148.537342]  #3:  (tasklist_lock){.+.+..}, at: 
>> [<ffffffff810b9ed4>] debug_show_all_locks+0x44/0x1e0
>>
>> #3 is from lockdep
>> #2 is not really a lock, annotated vfs atomic counter
>> #0 is annotated atomic, the freezing barrier
>>
>> #1 is a kernfs mutex that, afaics it's per file, but I don't like to see
>> the lock dependency here. That's a lock we can see now, but it's outside
>> of btrfs or the vfs. It's a matter of precaution.
> Thanks for pointing out the problem.
> It makes sense to delay it.
>
> But we have btrfs-workqueue, why not put it to "worker" workqueue?
>
> If using this method, we can just wrap btrfs_ioctl_set_fslabel() and 
> queue it to fs_info->workers.
> This can avoid the the lockdep problem, but the behavior is still 
> inconsistent with the synchronized
> ioctl method.
> Although not perfect, it should be good enough and still clean enough.
Wait a second, #1 is a mutex, so I didn't quite understand the problem.
Just because it is not btrfs/vfs mutex so we want to avoid it?
It seems not convincing enough for me...

For readonly/freeze check, I prefer extra vfsmount from sb->s_mounts and 
use mnt_want_write() (handle ro)
and transaction (handle freeze).
So IMHO it just needs some small tweaks on the original implementation.

Thanks,
Qu
>
> What do you think about such method?
>
> Thanks,
> Qu

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Jan. 28, 2015, 1:25 p.m. UTC | #4
On Mon, Jan 26, 2015 at 02:05:18PM +0800, Qu Wenruo wrote:
> 
> -------- Original Message --------
> Subject: Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for 
> processing pending changes" related commits
> From: Qu Wenruo <quwenruo@cn.fujitsu.com>
> To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, miaoxie@huawei.com
> Date: 2015?01?26? 08:37
> >
> > -------- Original Message --------
> > Subject: Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for 
> > processing pending changes" related commits
> > From: David Sterba <dsterba@suse.cz>
> > To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> > Date: 2015?01?23? 22:57
> >> On Fri, Jan 23, 2015 at 05:31:41PM +0800, Qu Wenruo wrote:
> >>> For mount option change, later patches will introduce copy-n-update
> >>> method and rwsem protects to keep mount options consistent during
> >>> transaction.
> >> That's a better approach, for the mount options.
> > I'm glad that you like this method.
> > Although the description in this patch is outdated, it is now 
> > per-transaction mount option.
> > Sorry for the confusion.
> >>
> >>> For sysfs interface to change label/features, it will keep the same
> >>> behavior as 'btrfs pro set', so pending changes are also not needed.
> >> This still leaves the transaction commit inside the syfs handler, that
> >> was one of the points not to do that.
> >>
> >> The callstack looks safe from, eg. the label handler:
> >>
> >> [169148.523158] WARNING: CPU: 1 PID: 2044 at fs/btrfs/sysfs.c:394 
> >> btrfs_label_store+0x135/0x190 [btrfs]()
> >> [169148.533925] Modules linked in: btrfs dm_flakey rpcsec_gss_krb5 
> >> loop [last unloaded: btrfs]
> >> [169148.536950] CPU: 1 PID: 2044 Comm: bash Tainted: G W      
> >> 3.19.0-rc5-default+ #211
> >> [169148.536952] Hardware name: Intel Corporation Santa Rosa 
> >> platform/Matanzas, BIOS TSRSCRB1.86C.0047.B00.0610170821 10/17/06
> >> [169148.536954]  000000000000018a ffff88007a753dc8 ffffffff81a9898b 
> >> 000000000000018a
> >> [169148.536963]  0000000000000000 ffff88007a753e08 ffffffff81077f65 
> >> ffff880077fb0100
> >> [169148.536972]  ffff880075dc0000 ffff880077fbff00 0000000000000009 
> >> ffff880075dc06d0
> >> [169148.536980] Call Trace:
> >> [169148.536983]  [<ffffffff81a9898b>] dump_stack+0x4f/0x6c
> >> [169148.536991]  [<ffffffff81077f65>] warn_slowpath_common+0x95/0xe0
> >> [169148.537000]  [<ffffffff81077fca>] warn_slowpath_null+0x1a/0x20
> >> [169148.537005]  [<ffffffffa0052b65>] btrfs_label_store+0x135/0x190 
> >> [btrfs]
> >> [169148.537030]  [<ffffffff813ed8b7>] kobj_attr_store+0x17/0x20
> >> [169148.537037]  [<ffffffff812147ff>] sysfs_kf_write+0x4f/0x70
> >> [169148.537044]  [<ffffffff81213cc8>] kernfs_fop_write+0x128/0x180
> >> [169148.537051]  [<ffffffff8119f404>] vfs_write+0xd4/0x1d0
> >> [169148.537059]  [<ffffffff8119f7b9>] SyS_write+0x59/0xd0
> >> [169148.537070]  [<ffffffff81a9f9d2>] system_call_fastpath+0x12/0x17
> >>
> >> Lockep shows these locks held:
> >>
> >> [169148.537296] 4 locks held by bash/2044:
> >> [169148.537309]  #0:  (sb_writers#5){.+.+.+}, at: 
> >> [<ffffffff8119f4e0>] vfs_write+0x1b0/0x1d0
> >> [169148.537319]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff81213c2e>] 
> >> kernfs_fop_write+0x8e/0x180
> >> [169148.537330]  #2:  (s_active#214){.+.+.+}, at: 
> >> [<ffffffff81213c36>] kernfs_fop_write+0x96/0x180
> >> [169148.537342]  #3:  (tasklist_lock){.+.+..}, at: 
> >> [<ffffffff810b9ed4>] debug_show_all_locks+0x44/0x1e0
> >>
> >> #3 is from lockdep
> >> #2 is not really a lock, annotated vfs atomic counter
> >> #0 is annotated atomic, the freezing barrier
> >>
> >> #1 is a kernfs mutex that, afaics it's per file, but I don't like to see
> >> the lock dependency here. That's a lock we can see now, but it's outside
> >> of btrfs or the vfs. It's a matter of precaution.
> > Thanks for pointing out the problem.
> > It makes sense to delay it.
> >
> > But we have btrfs-workqueue, why not put it to "worker" workqueue?

This only differs how the work item is stored until it's processed, it
will have to synchronize with the transaction commit anyway. The pending
changes can act as a special purpose work queue.

> > If using this method, we can just wrap btrfs_ioctl_set_fslabel() and 
> > queue it to fs_info->workers.
> > This can avoid the the lockdep problem,

There's no lockdep problem, the output comes from lockdep but just to
illustrate the lock chain at the time of potential commit time.

> > but the behavior is still 
> > inconsistent with the synchronized
> > ioctl method.
> > Although not perfect, it should be good enough and still clean enough.
> Wait a second, #1 is a mutex, so I didn't quite understand the problem.
> Just because it is not btrfs/vfs mutex so we want to avoid it?
> It seems not convincing enough for me...

It's sysfs/kernfs, isn't that enough? :)

The problem I see is that the whole transaction commit is called from
under that lock. We do some sysfs-related stuff like add or remove
objects (eg. devices), exporting space info etc.

Are we sure that there's no possible deadlock when we eg. change the
label via sysfs in the middle of a balance that removes some of the
files? Or other combination of operations. Can we guarantee that this
will be ok in the long term and not introduced accidentally?

> For readonly/freeze check, I prefer extra vfsmount from sb->s_mounts and 
> use mnt_want_write() (handle ro)
> and transaction (handle freeze).
> So IMHO it just needs some small tweaks on the original implementation.

But how can we do the mnt_want_write call inside sysfs handler?

Although we could drop the write support for label via sysfs in the end,
this whole exercise can be used later to decide what is/not safe to do
via sysfs. So it's good to try find ways right now.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Jan. 29, 2015, 1:15 a.m. UTC | #5
-------- Original Message --------
Subject: Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for 
processing pending changes" related commits
From: David Sterba <dsterba@suse.cz>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Date: 2015?01?28? 21:25
> On Mon, Jan 26, 2015 at 02:05:18PM +0800, Qu Wenruo wrote:
>> -------- Original Message --------
>> Subject: Re: [PATCH RFC v3 1/5] Revert "btrfs: add support for
>> processing pending changes" related commits
>> From: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, miaoxie@huawei.com
>> Date: 2015?01?26? 08:37
>>> [...snipped...]
>>> Thanks for pointing out the problem.
>>> It makes sense to delay it.
>>>
>>> But we have btrfs-workqueue, why not put it to "worker" workqueue?
> This only differs how the work item is stored until it's processed, it
> will have to synchronize with the transaction commit anyway. The pending
> changes can act as a special purpose work queue.
>
>>> If using this method, we can just wrap btrfs_ioctl_set_fslabel() and
>>> queue it to fs_info->workers.
>>> This can avoid the the lockdep problem,
> There's no lockdep problem, the output comes from lockdep but just to
> illustrate the lock chain at the time of potential commit time.
>
>>> but the behavior is still
>>> inconsistent with the synchronized
>>> ioctl method.
>>> Although not perfect, it should be good enough and still clean enough.
>> Wait a second, #1 is a mutex, so I didn't quite understand the problem.
>> Just because it is not btrfs/vfs mutex so we want to avoid it?
>> It seems not convincing enough for me...
> It's sysfs/kernfs, isn't that enough? :)
>
> The problem I see is that the whole transaction commit is called from
> under that lock. We do some sysfs-related stuff like add or remove
> objects (eg. devices), exporting space info etc.
>
> Are we sure that there's no possible deadlock when we eg. change the
> label via sysfs in the middle of a balance that removes some of the
> files? Or other combination of operations. Can we guarantee that this
> will be ok in the long term and not introduced accidentally?
For me, I didn't see the difference between VFS staff and sysfs/kernfs 
staff.
They both have their own locking things which is out of the control of 
btrfs.

But we are still using VFS staffs, right? If we want to use sysfs 
interfaces to do things like change label,
then it's our responsibility to ensure things works fine. If not we 
should either modify btrfs or sysfs to
do it, just like what we were doing with VFS staffs.

To ensure the cooperation works fine, we just need extra testcases, much 
like what we were doing.

So IMHO, I didn't really see the difference between VFS and sysfs staffs 
(except sysfs is not so wided adapted).
We just needs to do all the old style work, modify btrfs or sysfs or 
both and, and add tons of test case.
>
>> For readonly/freeze check, I prefer extra vfsmount from sb->s_mounts and
>> use mnt_want_write() (handle ro)
>> and transaction (handle freeze).
>> So IMHO it just needs some small tweaks on the original implementation.
> But how can we do the mnt_want_write call inside sysfs handler?
>
> Although we could drop the write support for label via sysfs in the end,
> this whole exercise can be used later to decide what is/not safe to do
> via sysfs. So it's good to try find ways right now.
I have already sent the new patchset, but it seems vger.kernel.org 
refused to accept the patchset,
I'll send it again try using my personal gmail account.

In that patchset, I added a new helper function in VFS, 
get_vfsmount_sb() to get a vfsmount from a sb,
so even we are in sysfs handler, we can still use mnt_want_write() to do 
all the protections.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Jan. 30, 2015, 5:30 p.m. UTC | #6
On Thu, Jan 29, 2015 at 09:15:12AM +0800, Qu Wenruo wrote:
> > Are we sure that there's no possible deadlock when we eg. change the
> > label via sysfs in the middle of a balance that removes some of the
> > files? Or other combination of operations. Can we guarantee that this
> > will be ok in the long term and not introduced accidentally?
> For me, I didn't see the difference between VFS staff and sysfs/kernfs 
> staff.
> They both have their own locking things which is out of the control of 
> btrfs.

VFS is a close neighbor in the layering, syscalls come through it,
affects filesystem state very often and API changes propagate to all the
filesystems.

Sysfs provides us some API but in a very limited scope compared to VFS.

> But we are still using VFS staffs, right? If we want to use sysfs 
> interfaces to do things like change label,
> then it's our responsibility to ensure things works fine.

I absolutelly agree with that and that's why I'm trying to minimize the
potential traps when the subsystems become interconnected too closely,
eg.  depending on internal locks.

> If not we 
> should either modify btrfs or sysfs to
> do it, just like what we were doing with VFS staffs.

This means changing innternal workings of the two, this seems unlikely
as we're mere users for them. Though we can bring new requests for API
or some such, we can't easily affect their internal logic just because
it's easy for us to throw a transaction commit somewhere and stop
caring.

> To ensure the cooperation works fine, we just need extra testcases, much 
> like what we were doing.
> 
> So IMHO, I didn't really see the difference between VFS and sysfs staffs 
> (except sysfs is not so wided adapted).
> We just needs to do all the old style work, modify btrfs or sysfs or 
> both and, and add tons of test case.

And I see a big difference, if nothing else, sysfs is user of VFS layer.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7e60741..5f99743 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1411,11 +1411,6 @@  struct btrfs_fs_info {
 	 */
 	u64 last_trans_log_full_commit;
 	unsigned long mount_opt;
-	/*
-	 * Track requests for actions that need to be done during transaction
-	 * commit (like for some mount options).
-	 */
-	unsigned long pending_changes;
 	unsigned long compress_type:4;
 	int commit_interval;
 	/*
@@ -2113,6 +2108,7 @@  struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 << 21)
 #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR	(1 << 22)
 #define BTRFS_MOUNT_RESCAN_UUID_TREE	(1 << 23)
+#define	BTRFS_MOUNT_CHANGE_INODE_CACHE	(1 << 24)
 
 #define BTRFS_DEFAULT_COMMIT_INTERVAL	(30)
 #define BTRFS_DEFAULT_MAX_INLINE	(8192)
@@ -2138,49 +2134,6 @@  struct btrfs_ioctl_defrag_range_args {
 }
 
 /*
- * Requests for changes that need to be done during transaction commit.
- *
- * Internal mount options that are used for special handling of the real
- * mount options (eg. cannot be set during remount and have to be set during
- * transaction commit)
- */
-
-#define BTRFS_PENDING_SET_INODE_MAP_CACHE	(0)
-#define BTRFS_PENDING_CLEAR_INODE_MAP_CACHE	(1)
-#define BTRFS_PENDING_COMMIT			(2)
-
-#define btrfs_test_pending(info, opt)	\
-	test_bit(BTRFS_PENDING_##opt, &(info)->pending_changes)
-#define btrfs_set_pending(info, opt)	\
-	set_bit(BTRFS_PENDING_##opt, &(info)->pending_changes)
-#define btrfs_clear_pending(info, opt)	\
-	clear_bit(BTRFS_PENDING_##opt, &(info)->pending_changes)
-
-/*
- * Helpers for setting pending mount option changes.
- *
- * Expects corresponding macros
- * BTRFS_PENDING_SET_ and CLEAR_ + short mount option name
- */
-#define btrfs_set_pending_and_info(info, opt, fmt, args...)            \
-do {                                                                   \
-       if (!btrfs_raw_test_opt((info)->mount_opt, opt)) {              \
-               btrfs_info((info), fmt, ##args);                        \
-               btrfs_set_pending((info), SET_##opt);                   \
-               btrfs_clear_pending((info), CLEAR_##opt);               \
-       }                                                               \
-} while(0)
-
-#define btrfs_clear_pending_and_info(info, opt, fmt, args...)          \
-do {                                                                   \
-       if (btrfs_raw_test_opt((info)->mount_opt, opt)) {               \
-               btrfs_info((info), fmt, ##args);                        \
-               btrfs_set_pending((info), CLEAR_##opt);                 \
-               btrfs_clear_pending((info), SET_##opt);                 \
-       }                                                               \
-} while(0)
-
-/*
  * Inode flags
  */
 #define BTRFS_INODE_NODATASUM		(1 << 0)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8c63419..2d3c8b7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2832,11 +2832,9 @@  retry_root_backup:
 		btrfs_set_opt(fs_info->mount_opt, SSD);
 	}
 
-	/*
-	 * Mount does not set all options immediatelly, we can do it now and do
-	 * not have to wait for transaction commit
-	 */
-	btrfs_apply_pending_changes(fs_info);
+	/* Set the real inode map cache flag */
+	if (btrfs_test_opt(tree_root, CHANGE_INODE_CACHE))
+		btrfs_set_opt(tree_root->fs_info->mount_opt, INODE_MAP_CACHE);
 
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
 	if (btrfs_test_opt(tree_root, CHECK_INTEGRITY)) {
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 74faea3..81efd83 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -178,7 +178,7 @@  static void start_caching(struct btrfs_root *root)
 			  root->root_key.objectid);
 	if (IS_ERR(tsk)) {
 		btrfs_warn(root->fs_info, "failed to start inode caching task");
-		btrfs_clear_pending_and_info(root->fs_info, INODE_MAP_CACHE,
+		btrfs_clear_and_info(root, CHANGE_INODE_CACHE,
 				"disabling inode map caching");
 	}
 }
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 60f7cbe..b0c45b2 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -993,17 +993,9 @@  int btrfs_sync_fs(struct super_block *sb, int wait)
 	trans = btrfs_attach_transaction_barrier(root);
 	if (IS_ERR(trans)) {
 		/* no transaction, don't bother */
-		if (PTR_ERR(trans) == -ENOENT) {
-			/*
-			 * Exit unless we have some pending changes
-			 * that need to go through commit
-			 */
-			if (fs_info->pending_changes == 0)
-				return 0;
-			trans = btrfs_start_transaction(root, 0);
-		} else {
-			return PTR_ERR(trans);
-		}
+		if (PTR_ERR(trans) == -ENOENT)
+			return 0;
+		return PTR_ERR(trans);
 	}
 	return btrfs_commit_transaction(trans, root);
 }
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 92db3f6..b2e7bb4 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -111,6 +111,7 @@  static ssize_t btrfs_feature_attr_store(struct kobject *kobj,
 {
 	struct btrfs_fs_info *fs_info;
 	struct btrfs_feature_attr *fa = to_btrfs_feature_attr(a);
+	struct btrfs_trans_handle *trans;
 	u64 features, set, clear;
 	unsigned long val;
 	int ret;
@@ -152,6 +153,10 @@  static ssize_t btrfs_feature_attr_store(struct kobject *kobj,
 	btrfs_info(fs_info, "%s %s feature flag",
 		   val ? "Setting" : "Clearing", fa->kobj_attr.attr.name);
 
+	trans = btrfs_start_transaction(fs_info->fs_root, 0);
+	if (IS_ERR(trans))
+		return PTR_ERR(trans);
+
 	spin_lock(&fs_info->super_lock);
 	features = get_features(fs_info, fa->feature_set);
 	if (val)
@@ -161,11 +166,9 @@  static ssize_t btrfs_feature_attr_store(struct kobject *kobj,
 	set_features(fs_info, fa->feature_set, features);
 	spin_unlock(&fs_info->super_lock);
 
-	/*
-	 * We don't want to do full transaction commit from inside sysfs
-	 */
-	btrfs_set_pending(fs_info, COMMIT);
-	wake_up_process(fs_info->transaction_kthread);
+	ret = btrfs_commit_transaction(trans, fs_info->fs_root);
+	if (ret)
+		return ret;
 
 	return count;
 }
@@ -369,6 +372,9 @@  static ssize_t btrfs_label_store(struct kobject *kobj,
 				 const char *buf, size_t len)
 {
 	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
+	struct btrfs_trans_handle *trans;
+	struct btrfs_root *root = fs_info->fs_root;
+	int ret;
 	size_t p_len;
 
 	if (fs_info->sb->s_flags & MS_RDONLY)
@@ -383,18 +389,20 @@  static ssize_t btrfs_label_store(struct kobject *kobj,
 	if (p_len >= BTRFS_LABEL_SIZE)
 		return -EINVAL;
 
-	spin_lock(&fs_info->super_lock);
+	trans = btrfs_start_transaction(root, 0);
+	if (IS_ERR(trans))
+		return PTR_ERR(trans);
+
+	spin_lock(&root->fs_info->super_lock);
 	memset(fs_info->super_copy->label, 0, BTRFS_LABEL_SIZE);
 	memcpy(fs_info->super_copy->label, buf, p_len);
-	spin_unlock(&fs_info->super_lock);
+	spin_unlock(&root->fs_info->super_lock);
+	ret = btrfs_commit_transaction(trans, root);
 
-	/*
-	 * We don't want to do full transaction commit from inside sysfs
-	 */
-	btrfs_set_pending(fs_info, COMMIT);
-	wake_up_process(fs_info->transaction_kthread);
+	if (!ret)
+		return len;
 
-	return len;
+	return ret;
 }
 BTRFS_ATTR_RW(label, btrfs_label_show, btrfs_label_store);
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index a605d4e..295a135 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1938,10 +1938,13 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 	}
 
 	/*
-	 * Since the transaction is done, we can apply the pending changes
-	 * before the next transaction.
+	 * Since the transaction is done, we should set the inode map cache flag
+	 * before any other comming transaction.
 	 */
-	btrfs_apply_pending_changes(root->fs_info);
+	if (btrfs_test_opt(root, CHANGE_INODE_CACHE))
+		btrfs_set_opt(root->fs_info->mount_opt, INODE_MAP_CACHE);
+	else
+		btrfs_clear_opt(root->fs_info->mount_opt, INODE_MAP_CACHE);
 
 	/* commit_fs_roots gets rid of all the tree log roots, it is now
 	 * safe to free the root of tree log roots
@@ -2112,32 +2115,3 @@  int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root)
 
 	return (ret < 0) ? 0 : 1;
 }
-
-void btrfs_apply_pending_changes(struct btrfs_fs_info *fs_info)
-{
-	unsigned long prev;
-	unsigned long bit;
-
-	prev = cmpxchg(&fs_info->pending_changes, 0, 0);
-	if (!prev)
-		return;
-
-	bit = 1 << BTRFS_PENDING_SET_INODE_MAP_CACHE;
-	if (prev & bit)
-		btrfs_set_opt(fs_info->mount_opt, INODE_MAP_CACHE);
-	prev &= ~bit;
-
-	bit = 1 << BTRFS_PENDING_CLEAR_INODE_MAP_CACHE;
-	if (prev & bit)
-		btrfs_clear_opt(fs_info->mount_opt, INODE_MAP_CACHE);
-	prev &= ~bit;
-
-	bit = 1 << BTRFS_PENDING_COMMIT;
-	if (prev & bit)
-		btrfs_debug(fs_info, "pending commit done");
-	prev &= ~bit;
-
-	if (prev)
-		btrfs_warn(fs_info,
-			"unknown pending changes left 0x%lx, ignoring", prev);
-}
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 00ed29c..fd400a3 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -170,6 +170,4 @@  int btrfs_wait_marked_extents(struct btrfs_root *root,
 int btrfs_transaction_blocked(struct btrfs_fs_info *info);
 int btrfs_transaction_in_commit(struct btrfs_fs_info *info);
 void btrfs_put_transaction(struct btrfs_transaction *transaction);
-void btrfs_apply_pending_changes(struct btrfs_fs_info *fs_info);
-
 #endif