[RFC] btrfs: change commit txn to end txn in subvol_setflags ioctl
diff mbox series

Message ID 20200804175516.2511704-1-boris@bur.io
State New
Headers show
Series
  • [RFC] btrfs: change commit txn to end txn in subvol_setflags ioctl
Related show

Commit Message

Boris Burkov Aug. 4, 2020, 5:55 p.m. UTC
Currently, btrfs_ioctl_subvol_setflags forces a btrfs_commit_transaction
while holding subvol_sem. As a result, we have seen workloads where
calling `btrfs property set -ts <subvol> ro false` hangs waiting for a
legitimately slow commit. This gets even worse if the workload tries to
set flags on multiple subvolumes and the ioctls pile up on subvol_sem.

Change the commit to a btrfs_end_transaction so that the ioctl can
return in a timely fashion and piggy back on a later commit.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/ioctl.c       | 2 +-
 fs/btrfs/transaction.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Qu Wenruo Aug. 4, 2020, 10:48 p.m. UTC | #1
On 2020/8/5 上午1:55, Boris Burkov wrote:
> Currently, btrfs_ioctl_subvol_setflags forces a btrfs_commit_transaction
> while holding subvol_sem. As a result, we have seen workloads where
> calling `btrfs property set -ts <subvol> ro false` hangs waiting for a
> legitimately slow commit. This gets even worse if the workload tries to
> set flags on multiple subvolumes and the ioctls pile up on subvol_sem.
> 
> Change the commit to a btrfs_end_transaction so that the ioctl can
> return in a timely fashion and piggy back on a later commit.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/ioctl.c       | 2 +-
>  fs/btrfs/transaction.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index bd3511c5ca81..3ae484768ce7 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1985,7 +1985,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>  		goto out_reset;
>  	}
>  
> -	ret = btrfs_commit_transaction(trans);
> +	ret = btrfs_end_transaction(trans);

This means the setflag is not committed to disk, and if a powerloss
happens before a transaction commit, then the setflag operation just get
lost.

This means, previously if this ioctl returns, users can expect that the
flag is always set no matter what, but now there is no guarantee.

Personally I'm not sure if we really want that operation to be committed
to disk.
Maybe that transaction commit can be initialized in user space, so for
multiple setflags, we only commit once, thus saves a lot of time.

Thanks,
Qu

>  
>  out_reset:
>  	if (ret)
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 20c6ac1a5de7..1dc44209c2ae 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -47,7 +47,7 @@
>   * | Will wait for previous running transaction to completely finish if there
>   * | is one
>   * |
> - * | Then one of the following happes:
> + * | Then one of the following happens:
>   * | - Wait for all other trans handle holders to release.
>   * |   The btrfs_commit_transaction() caller will do the commit work.
>   * | - Wait for current transaction to be committed by others.
> @@ -60,7 +60,7 @@
>   * |
>   * | To next stage:
>   * |  Caller is chosen to commit transaction N, and all other trans handle
> - * |  haven been released.
> + * |  have been released.
>   * V
>   * Transaction N [[TRANS_STATE_COMMIT_DOING]]
>   * |
>
Josef Bacik Aug. 4, 2020, 11:08 p.m. UTC | #2
On 8/4/20 6:48 PM, Qu Wenruo wrote:
> 
> 
> On 2020/8/5 上午1:55, Boris Burkov wrote:
>> Currently, btrfs_ioctl_subvol_setflags forces a btrfs_commit_transaction
>> while holding subvol_sem. As a result, we have seen workloads where
>> calling `btrfs property set -ts <subvol> ro false` hangs waiting for a
>> legitimately slow commit. This gets even worse if the workload tries to
>> set flags on multiple subvolumes and the ioctls pile up on subvol_sem.
>>
>> Change the commit to a btrfs_end_transaction so that the ioctl can
>> return in a timely fashion and piggy back on a later commit.
>>
>> Signed-off-by: Boris Burkov <boris@bur.io>
>> ---
>>   fs/btrfs/ioctl.c       | 2 +-
>>   fs/btrfs/transaction.c | 4 ++--
>>   2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index bd3511c5ca81..3ae484768ce7 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1985,7 +1985,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>>   		goto out_reset;
>>   	}
>>   
>> -	ret = btrfs_commit_transaction(trans);
>> +	ret = btrfs_end_transaction(trans);
> 
> This means the setflag is not committed to disk, and if a powerloss
> happens before a transaction commit, then the setflag operation just get
> lost.
> 
> This means, previously if this ioctl returns, users can expect that the
> flag is always set no matter what, but now there is no guarantee.
> 
> Personally I'm not sure if we really want that operation to be committed
> to disk.
> Maybe that transaction commit can be initialized in user space, so for
> multiple setflags, we only commit once, thus saves a lot of time.
> 

I'm of the opinion that we shouldn't be committing the transaction for stuff 
like this, unless there's a really good reason to.  Especially given we're 
holding the subvol lock here, we should just do end_transaction.  Thanks,

Josef
Martin Raiber Aug. 5, 2020, 1:40 p.m. UTC | #3
On 05.08.2020 01:08 Josef Bacik wrote:
> On 8/4/20 6:48 PM, Qu Wenruo wrote:
>>
>>
>> On 2020/8/5 上午1:55, Boris Burkov wrote:
>>> Currently, btrfs_ioctl_subvol_setflags forces a 
>>> btrfs_commit_transaction
>>> while holding subvol_sem. As a result, we have seen workloads where
>>> calling `btrfs property set -ts <subvol> ro false` hangs waiting for a
>>> legitimately slow commit. This gets even worse if the workload tries to
>>> set flags on multiple subvolumes and the ioctls pile up on subvol_sem.
>>>
>>> Change the commit to a btrfs_end_transaction so that the ioctl can
>>> return in a timely fashion and piggy back on a later commit.
>>>
>>> Signed-off-by: Boris Burkov <boris@bur.io>
>>> ---
>>>   fs/btrfs/ioctl.c       | 2 +-
>>>   fs/btrfs/transaction.c | 4 ++--
>>>   2 files changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>> index bd3511c5ca81..3ae484768ce7 100644
>>> --- a/fs/btrfs/ioctl.c
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -1985,7 +1985,7 @@ static noinline int 
>>> btrfs_ioctl_subvol_setflags(struct file *file,
>>>           goto out_reset;
>>>       }
>>>   -    ret = btrfs_commit_transaction(trans);
>>> +    ret = btrfs_end_transaction(trans);
>>
>> This means the setflag is not committed to disk, and if a powerloss
>> happens before a transaction commit, then the setflag operation just get
>> lost.
>>
>> This means, previously if this ioctl returns, users can expect that the
>> flag is always set no matter what, but now there is no guarantee.
>>
>> Personally I'm not sure if we really want that operation to be committed
>> to disk.
>> Maybe that transaction commit can be initialized in user space, so for
>> multiple setflags, we only commit once, thus saves a lot of time.
>>
>
> I'm of the opinion that we shouldn't be committing the transaction for 
> stuff like this, unless there's a really good reason to. Especially 
> given we're holding the subvol lock here, we should just do 
> end_transaction.  Thanks,
 From a user perspective I'd appreciate having the option to set it in a 
non-durable way (I have seen btrfs property sets hanging for a long time 
as well). But currently my application kind of depends on it being 
durable. Making it non-durable wouldn't break much and I guess the old 
behaviour could be emulated by a "btrfs fi sync <subvol>" afterwards, 
but idk how much other stuff depends on it being durable. Making it 
consistent with btrfs subvol del with the "-c" switch would be nice and 
consistent as well (and the -c switch could be done via IOC_SYNC after 
setting the properties).
Boris Burkov Aug. 7, 2020, 8:45 p.m. UTC | #4
On Wed, Aug 05, 2020 at 01:40:16PM +0000, Martin Raiber wrote:
> On 05.08.2020 01:08 Josef Bacik wrote:
> >On 8/4/20 6:48 PM, Qu Wenruo wrote:
> >>
> >>
> >>On 2020/8/5 上午1:55, Boris Burkov wrote:
> >>>Currently, btrfs_ioctl_subvol_setflags forces a
> >>>btrfs_commit_transaction
> >>>while holding subvol_sem. As a result, we have seen workloads where
> >>>calling `btrfs property set -ts <subvol> ro false` hangs waiting for a
> >>>legitimately slow commit. This gets even worse if the workload tries to
> >>>set flags on multiple subvolumes and the ioctls pile up on subvol_sem.
> >>>
> >>>Change the commit to a btrfs_end_transaction so that the ioctl can
> >>>return in a timely fashion and piggy back on a later commit.
> >>>
> >>>Signed-off-by: Boris Burkov <boris@bur.io>
> >>>---
> >>>  fs/btrfs/ioctl.c       | 2 +-
> >>>  fs/btrfs/transaction.c | 4 ++--
> >>>  2 files changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>>diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> >>>index bd3511c5ca81..3ae484768ce7 100644
> >>>--- a/fs/btrfs/ioctl.c
> >>>+++ b/fs/btrfs/ioctl.c
> >>>@@ -1985,7 +1985,7 @@ static noinline int
> >>>btrfs_ioctl_subvol_setflags(struct file *file,
> >>>          goto out_reset;
> >>>      }
> >>>  -    ret = btrfs_commit_transaction(trans);
> >>>+    ret = btrfs_end_transaction(trans);
> >>
> >>This means the setflag is not committed to disk, and if a powerloss
> >>happens before a transaction commit, then the setflag operation just get
> >>lost.
> >>
> >>This means, previously if this ioctl returns, users can expect that the
> >>flag is always set no matter what, but now there is no guarantee.
> >>
> >>Personally I'm not sure if we really want that operation to be committed
> >>to disk.
> >>Maybe that transaction commit can be initialized in user space, so for
> >>multiple setflags, we only commit once, thus saves a lot of time.
> >>
> >
> >I'm of the opinion that we shouldn't be committing the transaction
> >for stuff like this, unless there's a really good reason to.
> >Especially given we're holding the subvol lock here, we should
> >just do end_transaction.  Thanks,
> From a user perspective I'd appreciate having the option to set it
> in a non-durable way (I have seen btrfs property sets hanging for a
> long time as well). But currently my application kind of depends on
> it being durable. Making it non-durable wouldn't break much and I
> guess the old behaviour could be emulated by a "btrfs fi sync
> <subvol>" afterwards, but idk how much other stuff depends on it
> being durable. Making it consistent with btrfs subvol del with the
> "-c" switch would be nice and consistent as well (and the -c switch
> could be done via IOC_SYNC after setting the properties).

Martin,

Thanks for your perspective, that's helpful. Could you elaborate on how
your application relies on the durability? I would just like to learn
more about how this might affect people.

I really like the -c idea, but I fear if people are broadly depending on
that behavior by default, it wouldn't be enough.
Martin Raiber Aug. 10, 2020, 6:05 p.m. UTC | #5
On 07.08.2020 22:45 Boris Burkov wrote:
> On Wed, Aug 05, 2020 at 01:40:16PM +0000, Martin Raiber wrote:
>> On 05.08.2020 01:08 Josef Bacik wrote:
>>> On 8/4/20 6:48 PM, Qu Wenruo wrote:
>>>>
>>>> On 2020/8/5 上午1:55, Boris Burkov wrote:
>>>>> Currently, btrfs_ioctl_subvol_setflags forces a
>>>>> btrfs_commit_transaction
>>>>> while holding subvol_sem. As a result, we have seen workloads where
>>>>> calling `btrfs property set -ts <subvol> ro false` hangs waiting for a
>>>>> legitimately slow commit. This gets even worse if the workload tries to
>>>>> set flags on multiple subvolumes and the ioctls pile up on subvol_sem.
>>>>>
>>>>> Change the commit to a btrfs_end_transaction so that the ioctl can
>>>>> return in a timely fashion and piggy back on a later commit.
>>>>>
>>>>> Signed-off-by: Boris Burkov <boris@bur.io>
>>>>> ---
>>>>>    fs/btrfs/ioctl.c       | 2 +-
>>>>>    fs/btrfs/transaction.c | 4 ++--
>>>>>    2 files changed, 3 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>>> index bd3511c5ca81..3ae484768ce7 100644
>>>>> --- a/fs/btrfs/ioctl.c
>>>>> +++ b/fs/btrfs/ioctl.c
>>>>> @@ -1985,7 +1985,7 @@ static noinline int
>>>>> btrfs_ioctl_subvol_setflags(struct file *file,
>>>>>            goto out_reset;
>>>>>        }
>>>>>    -    ret = btrfs_commit_transaction(trans);
>>>>> +    ret = btrfs_end_transaction(trans);
>>>> This means the setflag is not committed to disk, and if a powerloss
>>>> happens before a transaction commit, then the setflag operation just get
>>>> lost.
>>>>
>>>> This means, previously if this ioctl returns, users can expect that the
>>>> flag is always set no matter what, but now there is no guarantee.
>>>>
>>>> Personally I'm not sure if we really want that operation to be committed
>>>> to disk.
>>>> Maybe that transaction commit can be initialized in user space, so for
>>>> multiple setflags, we only commit once, thus saves a lot of time.
>>>>
>>> I'm of the opinion that we shouldn't be committing the transaction
>>> for stuff like this, unless there's a really good reason to.
>>> Especially given we're holding the subvol lock here, we should
>>> just do end_transaction.  Thanks,
>>  From a user perspective I'd appreciate having the option to set it
>> in a non-durable way (I have seen btrfs property sets hanging for a
>> long time as well). But currently my application kind of depends on
>> it being durable. Making it non-durable wouldn't break much and I
>> guess the old behaviour could be emulated by a "btrfs fi sync
>> <subvol>" afterwards, but idk how much other stuff depends on it
>> being durable. Making it consistent with btrfs subvol del with the
>> "-c" switch would be nice and consistent as well (and the -c switch
>> could be done via IOC_SYNC after setting the properties).
> Martin,
>
> Thanks for your perspective, that's helpful. Could you elaborate on how
> your application relies on the durability? I would just like to learn
> more about how this might affect people.
>
> I really like the -c idea, but I fear if people are broadly depending on
> that behavior by default, it wouldn't be enough.

It is a backup software that currently works a bit like this:

1. Add database entry for new backup A with done=0
2. Create btrfs subvol A for backup
3. rsync backup source to A
4. btrfs fi sync A
5. Set subvol A to read-only
6. Set database entry for A to done=1

On startup: Delete all btrfs subvols of backups where done!=1 in the 
database.

Switching 4. and 5. should fix it if changing properties is not durable. 
Otherwise there could be subvols that don't get deleted on startup 
(after crash) and are not read-only. Those would be an annoyance e.g. if 
the backups are further replicated using btrfs end/receive, or if one 
relies on the finished backups being read-only.

Worst case there is someone that leaves 4. out and relies on 5. to sync 
to disk (would that work?).
Boris Burkov Aug. 25, 2020, 8:23 p.m. UTC | #6
Mon, Aug 10, 2020 at 06:05:41PM +0000, Martin Raiber wrote:
> On 07.08.2020 22:45 Boris Burkov wrote:
> >On Wed, Aug 05, 2020 at 01:40:16PM +0000, Martin Raiber wrote:
> >>On 05.08.2020 01:08 Josef Bacik wrote:
> >>>On 8/4/20 6:48 PM, Qu Wenruo wrote:
> >>>>
> >>>>On 2020/8/5 上午1:55, Boris Burkov wrote:
> >>>>>Currently, btrfs_ioctl_subvol_setflags forces a
> >>>>>btrfs_commit_transaction
> >>>>>while holding subvol_sem. As a result, we have seen workloads where
> >>>>>calling `btrfs property set -ts <subvol> ro false` hangs waiting for a
> >>>>>legitimately slow commit. This gets even worse if the workload tries to
> >>>>>set flags on multiple subvolumes and the ioctls pile up on subvol_sem.
> >>>>>
> >>>>>Change the commit to a btrfs_end_transaction so that the ioctl can
> >>>>>return in a timely fashion and piggy back on a later commit.
> >>>>>
> >>>>>Signed-off-by: Boris Burkov <boris@bur.io>
> >>>>>---
> >>>>>   fs/btrfs/ioctl.c       | 2 +-
> >>>>>   fs/btrfs/transaction.c | 4 ++--
> >>>>>   2 files changed, 3 insertions(+), 3 deletions(-)
> >>>>>
> >>>>>diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> >>>>>index bd3511c5ca81..3ae484768ce7 100644
> >>>>>--- a/fs/btrfs/ioctl.c
> >>>>>+++ b/fs/btrfs/ioctl.c
> >>>>>@@ -1985,7 +1985,7 @@ static noinline int
> >>>>>btrfs_ioctl_subvol_setflags(struct file *file,
> >>>>>           goto out_reset;
> >>>>>       }
> >>>>>   -    ret = btrfs_commit_transaction(trans);
> >>>>>+    ret = btrfs_end_transaction(trans);
> >>>>This means the setflag is not committed to disk, and if a powerloss
> >>>>happens before a transaction commit, then the setflag operation just get
> >>>>lost.
> >>>>
> >>>>This means, previously if this ioctl returns, users can expect that the
> >>>>flag is always set no matter what, but now there is no guarantee.
> >>>>
> >>>>Personally I'm not sure if we really want that operation to be committed
> >>>>to disk.
> >>>>Maybe that transaction commit can be initialized in user space, so for
> >>>>multiple setflags, we only commit once, thus saves a lot of time.
> >>>>
> >>>I'm of the opinion that we shouldn't be committing the transaction
> >>>for stuff like this, unless there's a really good reason to.
> >>>Especially given we're holding the subvol lock here, we should
> >>>just do end_transaction.  Thanks,
> >> From a user perspective I'd appreciate having the option to set it
> >>in a non-durable way (I have seen btrfs property sets hanging for a
> >>long time as well). But currently my application kind of depends on
> >>it being durable. Making it non-durable wouldn't break much and I
> >>guess the old behaviour could be emulated by a "btrfs fi sync
> >><subvol>" afterwards, but idk how much other stuff depends on it
> >>being durable. Making it consistent with btrfs subvol del with the
> >>"-c" switch would be nice and consistent as well (and the -c switch
> >>could be done via IOC_SYNC after setting the properties).
> >Martin,
> >
> >Thanks for your perspective, that's helpful. Could you elaborate on how
> >your application relies on the durability? I would just like to learn
> >more about how this might affect people.
> >
> >I really like the -c idea, but I fear if people are broadly depending on
> >that behavior by default, it wouldn't be enough.
> 
> It is a backup software that currently works a bit like this:
> 
> 1. Add database entry for new backup A with done=0
> 2. Create btrfs subvol A for backup
> 3. rsync backup source to A
> 4. btrfs fi sync A
> 5. Set subvol A to read-only
> 6. Set database entry for A to done=1
> 
> On startup: Delete all btrfs subvols of backups where done!=1 in the
> database.
> 
> Switching 4. and 5. should fix it if changing properties is not
> durable. Otherwise there could be subvols that don't get deleted on
> startup (after crash) and are not read-only. Those would be an
> annoyance e.g. if the backups are further replicated using btrfs
> end/receive, or if one relies on the finished backups being
> read-only.
> 
> Worst case there is someone that leaves 4. out and relies on 5. to
> sync to disk (would that work?).
> 
> 
Thanks for the extra detail, that example makes sense.

As I see it, our options to move forward are:
1. Leave the sync; suffer hung setflags calls, but no regression.
2a. Use this patch; risk affecting use-cases like Martin's.
2b. Also add a -c option to btrfs subvol setflags for people to move to.
3. Add an 'async' option to btrfs subvol setflags people can use if
they're affected by the issue this patch fixes.
4. Introduce a new command for setting a subvol read only with a -c flag

1 is a bummer as it doesn't move us towards less unneeded syncs. 2a/b
are "easy" but I fear they might be too risky. 3 and 4 introduce
different kinds of ugliness into the user interface, but don't
negatively affect existing use cases. I'm happy to write up whichever
variant people think is best.
Josef Bacik Aug. 26, 2020, 2:23 p.m. UTC | #7
On 8/4/20 1:55 PM, Boris Burkov wrote:
> Currently, btrfs_ioctl_subvol_setflags forces a btrfs_commit_transaction
> while holding subvol_sem. As a result, we have seen workloads where
> calling `btrfs property set -ts <subvol> ro false` hangs waiting for a
> legitimately slow commit. This gets even worse if the workload tries to
> set flags on multiple subvolumes and the ioctls pile up on subvol_sem.
> 
> Change the commit to a btrfs_end_transaction so that the ioctl can
> return in a timely fashion and piggy back on a later commit.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

I think we follow up with a btrfs-progs patch to make syncing an option with 
setflags (or hell do it by default and make the option to not sync).  Having the 
commit here was arbitrary and not needed.  Thanks,

Josef

Patch
diff mbox series

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index bd3511c5ca81..3ae484768ce7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1985,7 +1985,7 @@  static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
 		goto out_reset;
 	}
 
-	ret = btrfs_commit_transaction(trans);
+	ret = btrfs_end_transaction(trans);
 
 out_reset:
 	if (ret)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 20c6ac1a5de7..1dc44209c2ae 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -47,7 +47,7 @@ 
  * | Will wait for previous running transaction to completely finish if there
  * | is one
  * |
- * | Then one of the following happes:
+ * | Then one of the following happens:
  * | - Wait for all other trans handle holders to release.
  * |   The btrfs_commit_transaction() caller will do the commit work.
  * | - Wait for current transaction to be committed by others.
@@ -60,7 +60,7 @@ 
  * |
  * | To next stage:
  * |  Caller is chosen to commit transaction N, and all other trans handle
- * |  haven been released.
+ * |  have been released.
  * V
  * Transaction N [[TRANS_STATE_COMMIT_DOING]]
  * |