diff mbox series

btrfs: don't end the transaction for delayed refs in throttle

Message ID 20190124143143.8838-1-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series btrfs: don't end the transaction for delayed refs in throttle | expand

Commit Message

Josef Bacik Jan. 24, 2019, 2:31 p.m. UTC
Previously callers to btrfs_end_transaction_throttle() would commit the
transaction if there wasn't enough delayed refs space.  This happens in
relocation, and if the fs is relatively empty we'll run out of delayed
refs space basically immediately, so we'll just be stuck in this loop of
committing the transaction over and over again.

This code existed because we didn't have a good feedback mechanism for
running delayed refs, but with the delayed refs rsv we do now.  Delete
this throttling code and let the btrfs_start_transaction() in relocation
deal with putting pressure on the delayed refs infrastructure.  With
this patch we no longer take 5 minutes to balance a metadata only fs.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 8 --------
 1 file changed, 8 deletions(-)

Comments

David Sterba Feb. 12, 2019, 4:03 p.m. UTC | #1
On Thu, Jan 24, 2019 at 09:31:43AM -0500, Josef Bacik wrote:
> Previously callers to btrfs_end_transaction_throttle() would commit the
> transaction if there wasn't enough delayed refs space.  This happens in
> relocation, and if the fs is relatively empty we'll run out of delayed
> refs space basically immediately, so we'll just be stuck in this loop of
> committing the transaction over and over again.
> 
> This code existed because we didn't have a good feedback mechanism for
> running delayed refs, but with the delayed refs rsv we do now.  Delete
> this throttling code and let the btrfs_start_transaction() in relocation
> deal with putting pressure on the delayed refs infrastructure.  With
> this patch we no longer take 5 minutes to balance a metadata only fs.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

For the record, this has been merged to 5.0-rc5
Qu Wenruo June 3, 2019, 6:53 a.m. UTC | #2
On 2019/2/13 上午12:03, David Sterba wrote:
> On Thu, Jan 24, 2019 at 09:31:43AM -0500, Josef Bacik wrote:
>> Previously callers to btrfs_end_transaction_throttle() would commit the
>> transaction if there wasn't enough delayed refs space.  This happens in
>> relocation, and if the fs is relatively empty we'll run out of delayed
>> refs space basically immediately, so we'll just be stuck in this loop of
>> committing the transaction over and over again.
>>
>> This code existed because we didn't have a good feedback mechanism for
>> running delayed refs, but with the delayed refs rsv we do now.  Delete
>> this throttling code and let the btrfs_start_transaction() in relocation
>> deal with putting pressure on the delayed refs infrastructure.  With
>> this patch we no longer take 5 minutes to balance a metadata only fs.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> 
> For the record, this has been merged to 5.0-rc5
> 

Bisecting leads me to this patch for strange balance ENOSPC.

Can be reproduced by btrfs/156, or the following small script:
------
#!/bin/bash
dev="/dev/test/test"
mnt="/mnt/btrfs"

_fail()
{
	echo "!!! FAILED: $@ !!!"
	exit 1
}

do_work()
{
	umount $dev &> /dev/null
	umount $mnt &> /dev/null

	mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null

	mount $dev $mnt

	for i in $(seq -w 0 511); do
	#	xfs_io -f -c "falloc 0 1m" $mnt/file_$i > /dev/null
		xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
	done
	sync

	btrfs balance start --full $mnt || return 1
	sync


	btrfs balance start --full $mnt || return 1
	umount $mnt
}

failed=0
for i in $(seq -w 0 24); do
	echo "=== run $i ==="
	do_work
	if [ $? -eq 1 ]; then
		failed=$(($failed + 1))
	fi
done
if [ $failed -ne 0 ]; then
	echo "!!! failed $failed/25 !!!"
else
	echo "=== all passes ==="
fi
------

For v4.20, it will fail at the rate around 0/25 ~ 2/25 (very rare).
But at that patch (upstream commit
302167c50b32e7fccc98994a91d40ddbbab04e52), the failure rate raise to 25/25.

Any idea for that ENOSPC problem?
As it looks really wired for the 2nd full balance to fail even we have
enough unallocated space.

Thanks,
Qu
Josef Bacik June 3, 2019, 5:36 p.m. UTC | #3
On Mon, Jun 03, 2019 at 02:53:00PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/2/13 上午12:03, David Sterba wrote:
> > On Thu, Jan 24, 2019 at 09:31:43AM -0500, Josef Bacik wrote:
> >> Previously callers to btrfs_end_transaction_throttle() would commit the
> >> transaction if there wasn't enough delayed refs space.  This happens in
> >> relocation, and if the fs is relatively empty we'll run out of delayed
> >> refs space basically immediately, so we'll just be stuck in this loop of
> >> committing the transaction over and over again.
> >>
> >> This code existed because we didn't have a good feedback mechanism for
> >> running delayed refs, but with the delayed refs rsv we do now.  Delete
> >> this throttling code and let the btrfs_start_transaction() in relocation
> >> deal with putting pressure on the delayed refs infrastructure.  With
> >> this patch we no longer take 5 minutes to balance a metadata only fs.
> >>
> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > 
> > For the record, this has been merged to 5.0-rc5
> > 
> 
> Bisecting leads me to this patch for strange balance ENOSPC.
> 
> Can be reproduced by btrfs/156, or the following small script:
> ------
> #!/bin/bash
> dev="/dev/test/test"
> mnt="/mnt/btrfs"
> 
> _fail()
> {
> 	echo "!!! FAILED: $@ !!!"
> 	exit 1
> }
> 
> do_work()
> {
> 	umount $dev &> /dev/null
> 	umount $mnt &> /dev/null
> 
> 	mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null
> 
> 	mount $dev $mnt
> 
> 	for i in $(seq -w 0 511); do
> 	#	xfs_io -f -c "falloc 0 1m" $mnt/file_$i > /dev/null
> 		xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
> 	done
> 	sync
> 
> 	btrfs balance start --full $mnt || return 1
> 	sync
> 
> 
> 	btrfs balance start --full $mnt || return 1
> 	umount $mnt
> }
> 
> failed=0
> for i in $(seq -w 0 24); do
> 	echo "=== run $i ==="
> 	do_work
> 	if [ $? -eq 1 ]; then
> 		failed=$(($failed + 1))
> 	fi
> done
> if [ $failed -ne 0 ]; then
> 	echo "!!! failed $failed/25 !!!"
> else
> 	echo "=== all passes ==="
> fi
> ------
> 
> For v4.20, it will fail at the rate around 0/25 ~ 2/25 (very rare).
> But at that patch (upstream commit
> 302167c50b32e7fccc98994a91d40ddbbab04e52), the failure rate raise to 25/25.
> 
> Any idea for that ENOSPC problem?
> As it looks really wired for the 2nd full balance to fail even we have
> enough unallocated space.
> 

I've been running this all morning on kdave's misc-next and not had a single
failure.  I ran it a few times on spinning rust and a few times on my nvme
drive.  I wouldn't doubt that it's failing for you, but I can't reproduce.  It
would be helpful to know where the ENOSPC was coming from so I can think of
where the problem might be.  Thanks,

Josef
David Sterba June 3, 2019, 6:43 p.m. UTC | #4
On Mon, Jun 03, 2019 at 01:36:11PM -0400, Josef Bacik wrote:
> > For v4.20, it will fail at the rate around 0/25 ~ 2/25 (very rare).
> > But at that patch (upstream commit
> > 302167c50b32e7fccc98994a91d40ddbbab04e52), the failure rate raise to 25/25.
> > 
> > Any idea for that ENOSPC problem?
> > As it looks really wired for the 2nd full balance to fail even we have
> > enough unallocated space.
> > 
> 
> I've been running this all morning on kdave's misc-next and not had a single
> failure.  I ran it a few times on spinning rust and a few times on my nvme
> drive.  I wouldn't doubt that it's failing for you, but I can't reproduce.  It
> would be helpful to know where the ENOSPC was coming from so I can think of
> where the problem might be.  Thanks,

That's interesting because the test btrfs/156 hasn't succesfully
finished a single run on my VM testing setup since qemu 4.0 implemented
discard on virtio devices several weeks ago. It's 4cpu/2g, file-backed
virtio-scsi devices, files are on a spinning disk.
Qu Wenruo June 4, 2019, 12:31 a.m. UTC | #5
On 2019/6/4 上午1:36, Josef Bacik wrote:
> On Mon, Jun 03, 2019 at 02:53:00PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/2/13 上午12:03, David Sterba wrote:
>>> On Thu, Jan 24, 2019 at 09:31:43AM -0500, Josef Bacik wrote:
>>>> Previously callers to btrfs_end_transaction_throttle() would commit the
>>>> transaction if there wasn't enough delayed refs space.  This happens in
>>>> relocation, and if the fs is relatively empty we'll run out of delayed
>>>> refs space basically immediately, so we'll just be stuck in this loop of
>>>> committing the transaction over and over again.
>>>>
>>>> This code existed because we didn't have a good feedback mechanism for
>>>> running delayed refs, but with the delayed refs rsv we do now.  Delete
>>>> this throttling code and let the btrfs_start_transaction() in relocation
>>>> deal with putting pressure on the delayed refs infrastructure.  With
>>>> this patch we no longer take 5 minutes to balance a metadata only fs.
>>>>
>>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>>
>>> For the record, this has been merged to 5.0-rc5
>>>
>>
>> Bisecting leads me to this patch for strange balance ENOSPC.
>>
>> Can be reproduced by btrfs/156, or the following small script:
>> ------
>> #!/bin/bash
>> dev="/dev/test/test"
>> mnt="/mnt/btrfs"
>>
>> _fail()
>> {
>> 	echo "!!! FAILED: $@ !!!"
>> 	exit 1
>> }
>>
>> do_work()
>> {
>> 	umount $dev &> /dev/null
>> 	umount $mnt &> /dev/null
>>
>> 	mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null
>>
>> 	mount $dev $mnt
>>
>> 	for i in $(seq -w 0 511); do
>> 	#	xfs_io -f -c "falloc 0 1m" $mnt/file_$i > /dev/null
>> 		xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
>> 	done
>> 	sync
>>
>> 	btrfs balance start --full $mnt || return 1
>> 	sync
>>
>>
>> 	btrfs balance start --full $mnt || return 1
>> 	umount $mnt
>> }
>>
>> failed=0
>> for i in $(seq -w 0 24); do
>> 	echo "=== run $i ==="
>> 	do_work
>> 	if [ $? -eq 1 ]; then
>> 		failed=$(($failed + 1))
>> 	fi
>> done
>> if [ $failed -ne 0 ]; then
>> 	echo "!!! failed $failed/25 !!!"
>> else
>> 	echo "=== all passes ==="
>> fi
>> ------
>>
>> For v4.20, it will fail at the rate around 0/25 ~ 2/25 (very rare).
>> But at that patch (upstream commit
>> 302167c50b32e7fccc98994a91d40ddbbab04e52), the failure rate raise to 25/25.
>>
>> Any idea for that ENOSPC problem?
>> As it looks really wired for the 2nd full balance to fail even we have
>> enough unallocated space.
>>
> 
> I've been running this all morning on kdave's misc-next and not had a single
> failure.  I ran it a few times on spinning rust and a few times on my nvme
> drive.  I wouldn't doubt that it's failing for you, but I can't reproduce.  It
> would be helpful to know where the ENOSPC was coming from so I can think of
> where the problem might be.  Thanks,
> 
> Josef
> 

Since v5.2-rc2 has a lot of enospc debug output merged, here is the
debug info just by enospc_debug:

BTRFS: device fsid defe70f2-d083-41f0-a4fd-28a0cc03dce7 devid 1 transid
5 /dev/test/test
BTRFS info (device dm-3): disk space caching is enabled
BTRFS info (device dm-3): has skinny extents
BTRFS info (device dm-3): flagging fs with big metadata feature
BTRFS info (device dm-3): checking UUID tree
BTRFS info (device dm-3): balance: start -d -m -s
BTRFS info (device dm-3): relocating block group 726663168 flags metadata
BTRFS info (device dm-3): relocating block group 609222656 flags data
BTRFS info (device dm-3): found 57 extents
BTRFS info (device dm-3): found 57 extents
BTRFS info (device dm-3): relocating block group 491782144 flags data
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): relocating block group 374341632 flags data
BTRFS info (device dm-3): found 115 extents
BTRFS info (device dm-3): found 114 extents
BTRFS info (device dm-3): relocating block group 256901120 flags data
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): relocating block group 139460608 flags data
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): found 112 extents
BTRFS info (device dm-3): unable to make block group 22020096 ro
BTRFS info (device dm-3): sinfo_used=42909696 bg_num_bytes=116293632
min_allocable=1048576
BTRFS info (device dm-3): space_info 4 has 82919424 free, is not full
BTRFS info (device dm-3): space_info total=125829120, used=1015808,
pinned=0, reserved=81920, may_use=41746432, readonly=65536
BTRFS info (device dm-3): global_block_rsv: size 16777216 reserved 16744448
BTRFS info (device dm-3): trans_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): chunk_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_refs_rsv: size 61865984 reserved 25001984
BTRFS info (device dm-3): relocating block group 22020096 flags metadata
BTRFS info (device dm-3): found 54 extents
BTRFS info (device dm-3): relocating block group 13631488 flags data
BTRFS info (device dm-3): found 8 extents
BTRFS info (device dm-3): found 8 extents
BTRFS info (device dm-3): relocating block group 5242880 flags metadata
BTRFS info (device dm-3): found 56 extents
BTRFS info (device dm-3): unable to make block group 1048576 ro
BTRFS info (device dm-3): sinfo_used=32768 bg_num_bytes=4161536
min_allocable=1048576
BTRFS info (device dm-3): space_info 2 has 4161536 free, is not full
BTRFS info (device dm-3): space_info total=4194304, used=16384,
pinned=0, reserved=16384, may_use=0, readonly=0
BTRFS info (device dm-3): global_block_rsv: size 16777216 reserved 16744448
BTRFS info (device dm-3): trans_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): chunk_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_refs_rsv: size 3145728 reserved 1540096
BTRFS info (device dm-3): relocating block group 1048576 flags system
BTRFS info (device dm-3): found 1 extents
BTRFS info (device dm-3): balance: ended with status: 0
BTRFS info (device dm-3): balance: start -d -m -s
BTRFS info (device dm-3): unable to make block group 1431306240 ro
BTRFS info (device dm-3): sinfo_used=16384 bg_num_bytes=33538048
min_allocable=1048576
BTRFS info (device dm-3): space_info 2 has 33538048 free, is not full
BTRFS info (device dm-3): space_info total=33554432, used=16384,
pinned=0, reserved=0, may_use=0, readonly=0
BTRFS info (device dm-3): global_block_rsv: size 16777216 reserved 16777216
BTRFS info (device dm-3): trans_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): chunk_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_refs_rsv: size 0 reserved 0
BTRFS info (device dm-3): relocating block group 1431306240 flags system
BTRFS info (device dm-3): unable to make block group 1313865728 ro
BTRFS info (device dm-3): sinfo_used=19382272 bg_num_bytes=116342784
min_allocable=1048576
BTRFS info (device dm-3): space_info 4 has 98058240 free, is not full
BTRFS info (device dm-3): space_info total=117440512, used=1015808,
pinned=0, reserved=81920, may_use=18284544, readonly=0
BTRFS info (device dm-3): global_block_rsv: size 16777216 reserved 16744448
BTRFS info (device dm-3): trans_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): chunk_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_block_rsv: size 0 reserved 0
BTRFS info (device dm-3): delayed_refs_rsv: size 3145728 reserved 1540096
BTRFS info (device dm-3): relocating block group 1313865728 flags metadata
BTRFS info (device dm-3): found 55 extents
BTRFS info (device dm-3): relocating block group 1196425216 flags data
BTRFS info (device dm-3): found 65 extents
BTRFS info (device dm-3): found 65 extents
BTRFS warning (device dm-3): no space to allocate a new chunk for block
group 1078984704
BTRFS warning (device dm-3): no space to allocate a new chunk for block
group 961544192
BTRFS warning (device dm-3): no space to allocate a new chunk for block
group 844103680
BTRFS warning (device dm-3): no space to allocate a new chunk for block
group 726663168
BTRFS info (device dm-3): 4 enospc errors during balance
BTRFS info (device dm-3): balance: ended with status: -28

The ENOSPC is still from inc_block_group_ro().
For the first data block failure, it still looks like something in
bytes_may_use doesn't look correct.

Although for system block group, it's another story. It has no
reserved/pinned/may_use bytes, it's the min_allocable failing the check.

I can't remember if this is some bug I reported before, but looks a
little similar.

Thanks,
Qu
Josef Bacik June 4, 2019, 5:43 p.m. UTC | #6
On Tue, Jun 04, 2019 at 08:31:23AM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/6/4 上午1:36, Josef Bacik wrote:
> > On Mon, Jun 03, 2019 at 02:53:00PM +0800, Qu Wenruo wrote:
> >>
> >>
> >> On 2019/2/13 上午12:03, David Sterba wrote:
> >>> On Thu, Jan 24, 2019 at 09:31:43AM -0500, Josef Bacik wrote:
> >>>> Previously callers to btrfs_end_transaction_throttle() would commit the
> >>>> transaction if there wasn't enough delayed refs space.  This happens in
> >>>> relocation, and if the fs is relatively empty we'll run out of delayed
> >>>> refs space basically immediately, so we'll just be stuck in this loop of
> >>>> committing the transaction over and over again.
> >>>>
> >>>> This code existed because we didn't have a good feedback mechanism for
> >>>> running delayed refs, but with the delayed refs rsv we do now.  Delete
> >>>> this throttling code and let the btrfs_start_transaction() in relocation
> >>>> deal with putting pressure on the delayed refs infrastructure.  With
> >>>> this patch we no longer take 5 minutes to balance a metadata only fs.
> >>>>
> >>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> >>>
> >>> For the record, this has been merged to 5.0-rc5
> >>>
> >>
> >> Bisecting leads me to this patch for strange balance ENOSPC.
> >>
> >> Can be reproduced by btrfs/156, or the following small script:
> >> ------
> >> #!/bin/bash
> >> dev="/dev/test/test"
> >> mnt="/mnt/btrfs"
> >>
> >> _fail()
> >> {
> >> 	echo "!!! FAILED: $@ !!!"
> >> 	exit 1
> >> }
> >>
> >> do_work()
> >> {
> >> 	umount $dev &> /dev/null
> >> 	umount $mnt &> /dev/null
> >>
> >> 	mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null
> >>
> >> 	mount $dev $mnt
> >>
> >> 	for i in $(seq -w 0 511); do
> >> 	#	xfs_io -f -c "falloc 0 1m" $mnt/file_$i > /dev/null
> >> 		xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
> >> 	done
> >> 	sync
> >>
> >> 	btrfs balance start --full $mnt || return 1
> >> 	sync
> >>
> >>
> >> 	btrfs balance start --full $mnt || return 1
> >> 	umount $mnt
> >> }
> >>
> >> failed=0
> >> for i in $(seq -w 0 24); do
> >> 	echo "=== run $i ==="
> >> 	do_work
> >> 	if [ $? -eq 1 ]; then
> >> 		failed=$(($failed + 1))
> >> 	fi
> >> done
> >> if [ $failed -ne 0 ]; then
> >> 	echo "!!! failed $failed/25 !!!"
> >> else
> >> 	echo "=== all passes ==="
> >> fi
> >> ------
> >>
> >> For v4.20, it will fail at the rate around 0/25 ~ 2/25 (very rare).
> >> But at that patch (upstream commit
> >> 302167c50b32e7fccc98994a91d40ddbbab04e52), the failure rate raise to 25/25.
> >>
> >> Any idea for that ENOSPC problem?
> >> As it looks really wired for the 2nd full balance to fail even we have
> >> enough unallocated space.
> >>
> > 
> > I've been running this all morning on kdave's misc-next and not had a single
> > failure.  I ran it a few times on spinning rust and a few times on my nvme
> > drive.  I wouldn't doubt that it's failing for you, but I can't reproduce.  It
> > would be helpful to know where the ENOSPC was coming from so I can think of
> > where the problem might be.  Thanks,
> > 
> > Josef
> > 
> 
> Since v5.2-rc2 has a lot of enospc debug output merged, here is the
> debug info just by enospc_debug:
>

Ah ok, sorry I'm travelling so I can't easily test a patch right now, but change
the btrfs_join_transaction() in btrfs_inc_block_group_ro to
btrfs_start_transaction(root, 0);  This will trigger the delayed ref flushing if
we need it and likely will fix the problem.  There's so much random cruft built
into the relocation enospc stuff that we're likely to keep finding problems like
this, we just need to rework it so it's still tripping over the normal
reservation path.  Thanks,

Josef
Qu Wenruo July 4, 2019, 6:56 a.m. UTC | #7
On 2019/6/5 上午1:43, Josef Bacik wrote:
> On Tue, Jun 04, 2019 at 08:31:23AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/6/4 上午1:36, Josef Bacik wrote:
>>> On Mon, Jun 03, 2019 at 02:53:00PM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2019/2/13 上午12:03, David Sterba wrote:
>>>>> On Thu, Jan 24, 2019 at 09:31:43AM -0500, Josef Bacik wrote:
>>>>>> Previously callers to btrfs_end_transaction_throttle() would commit the
>>>>>> transaction if there wasn't enough delayed refs space.  This happens in
>>>>>> relocation, and if the fs is relatively empty we'll run out of delayed
>>>>>> refs space basically immediately, so we'll just be stuck in this loop of
>>>>>> committing the transaction over and over again.
>>>>>>
>>>>>> This code existed because we didn't have a good feedback mechanism for
>>>>>> running delayed refs, but with the delayed refs rsv we do now.  Delete
>>>>>> this throttling code and let the btrfs_start_transaction() in relocation
>>>>>> deal with putting pressure on the delayed refs infrastructure.  With
>>>>>> this patch we no longer take 5 minutes to balance a metadata only fs.
>>>>>>
>>>>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>>>>
>>>>> For the record, this has been merged to 5.0-rc5
>>>>>
>>>>
>>>> Bisecting leads me to this patch for strange balance ENOSPC.
>>>>
>>>> Can be reproduced by btrfs/156, or the following small script:
>>>> ------
>>>> #!/bin/bash
>>>> dev="/dev/test/test"
>>>> mnt="/mnt/btrfs"
>>>>
>>>> _fail()
>>>> {
>>>> 	echo "!!! FAILED: $@ !!!"
>>>> 	exit 1
>>>> }
>>>>
>>>> do_work()
>>>> {
>>>> 	umount $dev &> /dev/null
>>>> 	umount $mnt &> /dev/null
>>>>
>>>> 	mkfs.btrfs -b 1G -m single -d single $dev -f > /dev/null
>>>>
>>>> 	mount $dev $mnt
>>>>
>>>> 	for i in $(seq -w 0 511); do
>>>> 	#	xfs_io -f -c "falloc 0 1m" $mnt/file_$i > /dev/null
>>>> 		xfs_io -f -c "pwrite 0 1m" $mnt/inline_$i > /dev/null
>>>> 	done
>>>> 	sync
>>>>
>>>> 	btrfs balance start --full $mnt || return 1
>>>> 	sync
>>>>
>>>>
>>>> 	btrfs balance start --full $mnt || return 1
>>>> 	umount $mnt
>>>> }
>>>>
>>>> failed=0
>>>> for i in $(seq -w 0 24); do
>>>> 	echo "=== run $i ==="
>>>> 	do_work
>>>> 	if [ $? -eq 1 ]; then
>>>> 		failed=$(($failed + 1))
>>>> 	fi
>>>> done
>>>> if [ $failed -ne 0 ]; then
>>>> 	echo "!!! failed $failed/25 !!!"
>>>> else
>>>> 	echo "=== all passes ==="
>>>> fi
>>>> ------
>>>>
>>>> For v4.20, it will fail at the rate around 0/25 ~ 2/25 (very rare).
>>>> But at that patch (upstream commit
>>>> 302167c50b32e7fccc98994a91d40ddbbab04e52), the failure rate raise to 25/25.
>>>>
>>>> Any idea for that ENOSPC problem?
>>>> As it looks really wired for the 2nd full balance to fail even we have
>>>> enough unallocated space.
>>>>
>>>
>>> I've been running this all morning on kdave's misc-next and not had a single
>>> failure.  I ran it a few times on spinning rust and a few times on my nvme
>>> drive.  I wouldn't doubt that it's failing for you, but I can't reproduce.  It
>>> would be helpful to know where the ENOSPC was coming from so I can think of
>>> where the problem might be.  Thanks,
>>>
>>> Josef
>>>
>>
>> Since v5.2-rc2 has a lot of enospc debug output merged, here is the
>> debug info just by enospc_debug:
>>
> 
> Ah ok, sorry I'm travelling so I can't easily test a patch right now, but change
> the btrfs_join_transaction() in btrfs_inc_block_group_ro to
> btrfs_start_transaction(root, 0);  This will trigger the delayed ref flushing if
> we need it and likely will fix the problem.

Unfortunately, it doesn't work as expected.

False ENOSPC still gets triggered.

The same problem for system chunk, min_alloc is causing problem, as
pinned/reserved/may_use are all zero.
Just the min_allocable of 1M makes the check to fail.

While for metadata, btrfs_start_transaction(root, 0) doesn't solve the
problem as reserved and may_use is still relatively high.

I'll dig further to find a different way to solve it.

Thanks,
Qu

>  There's so much random cruft built
> into the relocation enospc stuff that we're likely to keep finding problems like
> this, we just need to rework it so it's still tripping over the normal
> reservation path.  Thanks,
> 
> Josef 
>
diff mbox series

Patch

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 22b0dacae003..ef98b3cd30db 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -851,14 +851,6 @@  static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 
 	btrfs_trans_release_chunk_metadata(trans);
 
-	if (lock && should_end_transaction(trans) &&
-	    READ_ONCE(cur_trans->state) == TRANS_STATE_RUNNING) {
-		spin_lock(&info->trans_lock);
-		if (cur_trans->state == TRANS_STATE_RUNNING)
-			cur_trans->state = TRANS_STATE_BLOCKED;
-		spin_unlock(&info->trans_lock);
-	}
-
 	if (lock && READ_ONCE(cur_trans->state) == TRANS_STATE_BLOCKED) {
 		if (throttle)
 			return btrfs_commit_transaction(trans);