Message ID | 20160720055637.7275-5-wangxg.fnst@cn.fujitsu.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On 07/20/2016 01:56 AM, Wang Xiaoguang wrote: > This patch can fix some false ENOSPC errors, below test script can > reproduce one false ENOSPC error: > #!/bin/bash > dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128 > dev=$(losetup --show -f fs.img) > mkfs.btrfs -f -M $dev > mkdir /tmp/mntpoint > mount $dev /tmp/mntpoint > cd /tmp/mntpoint > xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile > > Above script will fail for ENOSPC reason, but indeed fs still has free > space to satisfy this request. Please see call graph: > btrfs_fallocate() > |-> btrfs_alloc_data_chunk_ondemand() > | bytes_may_use += 64M > |-> btrfs_prealloc_file_range() > |-> btrfs_reserve_extent() > |-> btrfs_add_reserved_bytes() > | alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not > | change bytes_may_use, and bytes_reserved += 64M. Now > | bytes_may_use + bytes_reserved == 128M, which is greater > | than btrfs_space_info's total_bytes, false enospc occurs. > | Note, the bytes_may_use decrease operation will done in > | end of btrfs_fallocate(), which is too late. > > Here is another simple case for buffered write: > CPU 1 | CPU 2 > | > |-> cow_file_range() |-> __btrfs_buffered_write() > |-> btrfs_reserve_extent() | | > | | | > | | | > | ..... | |-> btrfs_check_data_free_space() > | | > | | > |-> extent_clear_unlock_delalloc() | > > In CPU 1, btrfs_reserve_extent()->find_free_extent()-> > btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease > operation will be delayed to be done in extent_clear_unlock_delalloc(). > Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's > btrfs_check_data_free_space() tries to reserve 100MB data space. > If > 100MB > data_sinfo->total_bytes - data_sinfo->bytes_used - > data_sinfo->bytes_reserved - data_sinfo->bytes_pinned - > data_sinfo->bytes_readonly - data_sinfo->bytes_may_use > btrfs_check_data_free_space() will try to allcate new data chunk or call > btrfs_start_delalloc_roots(), or commit current transaction inorder to > reserve some free space, obviously a lot of work. But indeed it's not > necessary as long as decreasing bytes_may_use timely, we still have > free space, decreasing 128M from bytes_may_use. > > To fix this issue, this patch chooses to update bytes_may_use for both > data and metadata in btrfs_add_reserved_bytes(). For compress path, real > extent length may not be equal to file content length, so introduce a > ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and > btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by > file content length. Then compress path can update bytes_may_use > correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC > and RESERVE_FREE. > > For inode marked as NODATACOW or extent marked as PREALLOC, we can > directly call btrfs_free_reserved_data_space() to adjust bytes_may_use. > > Meanwhile __btrfs_prealloc_file_range() will call > btrfs_free_reserved_data_space() internally for both sucessful and failed > path, btrfs_prealloc_file_range()'s callers does not need to call > btrfs_free_reserved_data_space() any more. > > Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> > --- > fs/btrfs/ctree.h | 2 +- > fs/btrfs/extent-tree.c | 56 +++++++++++++++++--------------------------------- > fs/btrfs/file.c | 26 +++++++++++++---------- > fs/btrfs/inode-map.c | 3 +-- > fs/btrfs/inode.c | 37 ++++++++++++++++++++++++--------- > fs/btrfs/relocation.c | 11 ++++++++-- > 6 files changed, 72 insertions(+), 63 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 4274a7b..7eb2913 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > u64 root_objectid, u64 owner, u64 offset, > struct btrfs_key *ins); > -int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes, > +int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 num_bytes, > u64 min_alloc_size, u64 empty_size, u64 hint_byte, > struct btrfs_key *ins, int is_data, int delalloc); > int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 8eaac39..5447973 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -60,21 +60,6 @@ enum { > CHUNK_ALLOC_FORCE = 2, > }; > > -/* > - * Control how reservations are dealt with. > - * > - * RESERVE_FREE - freeing a reservation. > - * RESERVE_ALLOC - allocating space and we need to update bytes_may_use for > - * ENOSPC accounting > - * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update > - * bytes_may_use as the ENOSPC accounting is done elsewhere > - */ > -enum { > - RESERVE_FREE = 0, > - RESERVE_ALLOC = 1, > - RESERVE_ALLOC_NO_ACCOUNT = 2, > -}; > - > static int update_block_group(struct btrfs_trans_handle *trans, > struct btrfs_root *root, u64 bytenr, > u64 num_bytes, int alloc); > @@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path, int level, > static void dump_space_info(struct btrfs_space_info *info, u64 bytes, > int dump_block_groups); > static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache, > - u64 num_bytes, int reserve, int delalloc); > + u64 ram_bytes, u64 num_bytes, int delalloc); > static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache, > u64 num_bytes, int delalloc); > static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv, > @@ -3491,7 +3476,6 @@ again: > dcs = BTRFS_DC_SETUP; > else if (ret == -ENOSPC) > set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags); > - btrfs_free_reserved_data_space(inode, 0, num_pages); > > out_put: > iput(inode); > @@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg) > /** > * btrfs_add_reserved_bytes - update the block_group and space info counters > * @cache: The cache we are manipulating > + * @ram_bytes: The number of bytes of file content, and will be same to > + * @num_bytes except for the compress path. > * @num_bytes: The number of bytes in question > - * @reserve: One of the reservation enums > * @delalloc: The blocks are allocated for the delalloc write > * > * This is called by the allocator when it reserves space. Metadata > @@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg) > * succeeds. > */ > static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache, > - u64 num_bytes, int reserve, int delalloc) > + u64 ram_bytes, u64 num_bytes, int delalloc) > { > struct btrfs_space_info *space_info = cache->space_info; > int ret = 0; > @@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache, > } else { > cache->reserved += num_bytes; > space_info->bytes_reserved += num_bytes; > - if (reserve == RESERVE_ALLOC) { > - trace_btrfs_space_reservation(cache->fs_info, > - "space_info", space_info->flags, > - num_bytes, 0); > - space_info->bytes_may_use -= num_bytes; > - } > > + trace_btrfs_space_reservation(cache->fs_info, > + "space_info", space_info->flags, > + num_bytes, 0); This needs to be ram_bytes to keep the accounting consistent for tools that use these tracepoints. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
hello, On 07/20/2016 09:35 PM, Josef Bacik wrote: > On 07/20/2016 01:56 AM, Wang Xiaoguang wrote: >> This patch can fix some false ENOSPC errors, below test script can >> reproduce one false ENOSPC error: >> #!/bin/bash >> dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128 >> dev=$(losetup --show -f fs.img) >> mkfs.btrfs -f -M $dev >> mkdir /tmp/mntpoint >> mount $dev /tmp/mntpoint >> cd /tmp/mntpoint >> xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile >> >> Above script will fail for ENOSPC reason, but indeed fs still has free >> space to satisfy this request. Please see call graph: >> btrfs_fallocate() >> |-> btrfs_alloc_data_chunk_ondemand() >> | bytes_may_use += 64M >> |-> btrfs_prealloc_file_range() >> |-> btrfs_reserve_extent() >> |-> btrfs_add_reserved_bytes() >> | alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not >> | change bytes_may_use, and bytes_reserved += 64M. Now >> | bytes_may_use + bytes_reserved == 128M, which is greater >> | than btrfs_space_info's total_bytes, false enospc occurs. >> | Note, the bytes_may_use decrease operation will done in >> | end of btrfs_fallocate(), which is too late. >> >> Here is another simple case for buffered write: >> CPU 1 | CPU 2 >> | >> |-> cow_file_range() |-> __btrfs_buffered_write() >> |-> btrfs_reserve_extent() | | >> | | | >> | | | >> | ..... | |-> >> btrfs_check_data_free_space() >> | | >> | | >> |-> extent_clear_unlock_delalloc() | >> >> In CPU 1, btrfs_reserve_extent()->find_free_extent()-> >> btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease >> operation will be delayed to be done in extent_clear_unlock_delalloc(). >> Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's >> btrfs_check_data_free_space() tries to reserve 100MB data space. >> If >> 100MB > data_sinfo->total_bytes - data_sinfo->bytes_used - >> data_sinfo->bytes_reserved - data_sinfo->bytes_pinned - >> data_sinfo->bytes_readonly - data_sinfo->bytes_may_use >> btrfs_check_data_free_space() will try to allcate new data chunk or call >> btrfs_start_delalloc_roots(), or commit current transaction inorder to >> reserve some free space, obviously a lot of work. But indeed it's not >> necessary as long as decreasing bytes_may_use timely, we still have >> free space, decreasing 128M from bytes_may_use. >> >> To fix this issue, this patch chooses to update bytes_may_use for both >> data and metadata in btrfs_add_reserved_bytes(). For compress path, real >> extent length may not be equal to file content length, so introduce a >> ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and >> btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by >> file content length. Then compress path can update bytes_may_use >> correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, >> RESERVE_ALLOC >> and RESERVE_FREE. >> >> For inode marked as NODATACOW or extent marked as PREALLOC, we can >> directly call btrfs_free_reserved_data_space() to adjust bytes_may_use. >> >> Meanwhile __btrfs_prealloc_file_range() will call >> btrfs_free_reserved_data_space() internally for both sucessful and >> failed >> path, btrfs_prealloc_file_range()'s callers does not need to call >> btrfs_free_reserved_data_space() any more. >> >> Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> >> --- >> fs/btrfs/ctree.h | 2 +- >> fs/btrfs/extent-tree.c | 56 >> +++++++++++++++++--------------------------------- >> fs/btrfs/file.c | 26 +++++++++++++---------- >> fs/btrfs/inode-map.c | 3 +-- >> fs/btrfs/inode.c | 37 ++++++++++++++++++++++++--------- >> fs/btrfs/relocation.c | 11 ++++++++-- >> 6 files changed, 72 insertions(+), 63 deletions(-) >> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >> index 4274a7b..7eb2913 100644 >> --- a/fs/btrfs/ctree.h >> +++ b/fs/btrfs/ctree.h >> @@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct >> btrfs_trans_handle *trans, >> struct btrfs_root *root, >> u64 root_objectid, u64 owner, u64 offset, >> struct btrfs_key *ins); >> -int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes, >> +int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 >> num_bytes, >> u64 min_alloc_size, u64 empty_size, u64 hint_byte, >> struct btrfs_key *ins, int is_data, int delalloc); >> int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct >> btrfs_root *root, >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index 8eaac39..5447973 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -60,21 +60,6 @@ enum { >> CHUNK_ALLOC_FORCE = 2, >> }; >> >> -/* >> - * Control how reservations are dealt with. >> - * >> - * RESERVE_FREE - freeing a reservation. >> - * RESERVE_ALLOC - allocating space and we need to update >> bytes_may_use for >> - * ENOSPC accounting >> - * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update >> - * bytes_may_use as the ENOSPC accounting is done elsewhere >> - */ >> -enum { >> - RESERVE_FREE = 0, >> - RESERVE_ALLOC = 1, >> - RESERVE_ALLOC_NO_ACCOUNT = 2, >> -}; >> - >> static int update_block_group(struct btrfs_trans_handle *trans, >> struct btrfs_root *root, u64 bytenr, >> u64 num_bytes, int alloc); >> @@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path, >> int level, >> static void dump_space_info(struct btrfs_space_info *info, u64 bytes, >> int dump_block_groups); >> static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache >> *cache, >> - u64 num_bytes, int reserve, int delalloc); >> + u64 ram_bytes, u64 num_bytes, int delalloc); >> static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache >> *cache, >> u64 num_bytes, int delalloc); >> static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv, >> @@ -3491,7 +3476,6 @@ again: >> dcs = BTRFS_DC_SETUP; >> else if (ret == -ENOSPC) >> set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags); >> - btrfs_free_reserved_data_space(inode, 0, num_pages); >> >> out_put: >> iput(inode); >> @@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct >> btrfs_block_group_cache *bg) >> /** >> * btrfs_add_reserved_bytes - update the block_group and space info >> counters >> * @cache: The cache we are manipulating >> + * @ram_bytes: The number of bytes of file content, and will be >> same to >> + * @num_bytes except for the compress path. >> * @num_bytes: The number of bytes in question >> - * @reserve: One of the reservation enums >> * @delalloc: The blocks are allocated for the delalloc write >> * >> * This is called by the allocator when it reserves space. Metadata >> @@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct >> btrfs_block_group_cache *bg) >> * succeeds. >> */ >> static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache >> *cache, >> - u64 num_bytes, int reserve, int delalloc) >> + u64 ram_bytes, u64 num_bytes, int delalloc) >> { >> struct btrfs_space_info *space_info = cache->space_info; >> int ret = 0; >> @@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct >> btrfs_block_group_cache *cache, >> } else { >> cache->reserved += num_bytes; >> space_info->bytes_reserved += num_bytes; >> - if (reserve == RESERVE_ALLOC) { >> - trace_btrfs_space_reservation(cache->fs_info, >> - "space_info", space_info->flags, >> - num_bytes, 0); >> - space_info->bytes_may_use -= num_bytes; >> - } >> >> + trace_btrfs_space_reservation(cache->fs_info, >> + "space_info", space_info->flags, >> + num_bytes, 0); > > This needs to be ram_bytes to keep the accounting consistent for tools > that use these tracepoints. Thanks, OK, I'll fix this issue in later version. Regards, Xiaoguang Wang > > Josef > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4274a7b..7eb2913 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2556,7 +2556,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 root_objectid, u64 owner, u64 offset, struct btrfs_key *ins); -int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes, +int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 num_bytes, u64 min_alloc_size, u64 empty_size, u64 hint_byte, struct btrfs_key *ins, int is_data, int delalloc); int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8eaac39..5447973 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -60,21 +60,6 @@ enum { CHUNK_ALLOC_FORCE = 2, }; -/* - * Control how reservations are dealt with. - * - * RESERVE_FREE - freeing a reservation. - * RESERVE_ALLOC - allocating space and we need to update bytes_may_use for - * ENOSPC accounting - * RESERVE_ALLOC_NO_ACCOUNT - allocating space and we should not update - * bytes_may_use as the ENOSPC accounting is done elsewhere - */ -enum { - RESERVE_FREE = 0, - RESERVE_ALLOC = 1, - RESERVE_ALLOC_NO_ACCOUNT = 2, -}; - static int update_block_group(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, int alloc); @@ -105,7 +90,7 @@ static int find_next_key(struct btrfs_path *path, int level, static void dump_space_info(struct btrfs_space_info *info, u64 bytes, int dump_block_groups); static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache, - u64 num_bytes, int reserve, int delalloc); + u64 ram_bytes, u64 num_bytes, int delalloc); static int btrfs_free_reserved_bytes(struct btrfs_block_group_cache *cache, u64 num_bytes, int delalloc); static int block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv, @@ -3491,7 +3476,6 @@ again: dcs = BTRFS_DC_SETUP; else if (ret == -ENOSPC) set_bit(BTRFS_TRANS_CACHE_ENOSPC, &trans->transaction->flags); - btrfs_free_reserved_data_space(inode, 0, num_pages); out_put: iput(inode); @@ -6300,8 +6284,9 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg) /** * btrfs_add_reserved_bytes - update the block_group and space info counters * @cache: The cache we are manipulating + * @ram_bytes: The number of bytes of file content, and will be same to + * @num_bytes except for the compress path. * @num_bytes: The number of bytes in question - * @reserve: One of the reservation enums * @delalloc: The blocks are allocated for the delalloc write * * This is called by the allocator when it reserves space. Metadata @@ -6316,7 +6301,7 @@ void btrfs_wait_block_group_reservations(struct btrfs_block_group_cache *bg) * succeeds. */ static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache, - u64 num_bytes, int reserve, int delalloc) + u64 ram_bytes, u64 num_bytes, int delalloc) { struct btrfs_space_info *space_info = cache->space_info; int ret = 0; @@ -6328,13 +6313,11 @@ static int btrfs_add_reserved_bytes(struct btrfs_block_group_cache *cache, } else { cache->reserved += num_bytes; space_info->bytes_reserved += num_bytes; - if (reserve == RESERVE_ALLOC) { - trace_btrfs_space_reservation(cache->fs_info, - "space_info", space_info->flags, - num_bytes, 0); - space_info->bytes_may_use -= num_bytes; - } + trace_btrfs_space_reservation(cache->fs_info, + "space_info", space_info->flags, + num_bytes, 0); + space_info->bytes_may_use -= ram_bytes; if (delalloc) cache->delalloc_bytes += num_bytes; } @@ -7218,9 +7201,9 @@ btrfs_release_block_group(struct btrfs_block_group_cache *cache, * the free space extent currently. */ static noinline int find_free_extent(struct btrfs_root *orig_root, - u64 num_bytes, u64 empty_size, - u64 hint_byte, struct btrfs_key *ins, - u64 flags, int delalloc) + u64 ram_bytes, u64 num_bytes, u64 empty_size, + u64 hint_byte, struct btrfs_key *ins, + u64 flags, int delalloc) { int ret = 0; struct btrfs_root *root = orig_root->fs_info->extent_root; @@ -7232,8 +7215,6 @@ static noinline int find_free_extent(struct btrfs_root *orig_root, struct btrfs_space_info *space_info; int loop = 0; int index = __get_raid_index(flags); - int alloc_type = (flags & BTRFS_BLOCK_GROUP_DATA) ? - RESERVE_ALLOC_NO_ACCOUNT : RESERVE_ALLOC; bool failed_cluster_refill = false; bool failed_alloc = false; bool use_cluster = true; @@ -7565,8 +7546,8 @@ checks: search_start - offset); BUG_ON(offset > search_start); - ret = btrfs_add_reserved_bytes(block_group, num_bytes, - alloc_type, delalloc); + ret = btrfs_add_reserved_bytes(block_group, ram_bytes, + num_bytes, delalloc); if (ret == -EAGAIN) { btrfs_add_free_space(block_group, offset, num_bytes); goto loop; @@ -7739,7 +7720,7 @@ again: up_read(&info->groups_sem); } -int btrfs_reserve_extent(struct btrfs_root *root, +int btrfs_reserve_extent(struct btrfs_root *root, u64 ram_bytes, u64 num_bytes, u64 min_alloc_size, u64 empty_size, u64 hint_byte, struct btrfs_key *ins, int is_data, int delalloc) @@ -7751,8 +7732,8 @@ int btrfs_reserve_extent(struct btrfs_root *root, flags = btrfs_get_alloc_profile(root, is_data); again: WARN_ON(num_bytes < root->sectorsize); - ret = find_free_extent(root, num_bytes, empty_size, hint_byte, ins, - flags, delalloc); + ret = find_free_extent(root, ram_bytes, num_bytes, empty_size, + hint_byte, ins, flags, delalloc); if (!ret && !is_data) { btrfs_dec_block_group_reservations(root->fs_info, ins->objectid); @@ -7761,6 +7742,7 @@ again: num_bytes = min(num_bytes >> 1, ins->offset); num_bytes = round_down(num_bytes, root->sectorsize); num_bytes = max(num_bytes, min_alloc_size); + ram_bytes = num_bytes; if (num_bytes == min_alloc_size) final_tried = true; goto again; @@ -8029,7 +8011,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, return -EINVAL; ret = btrfs_add_reserved_bytes(block_group, ins->offset, - RESERVE_ALLOC_NO_ACCOUNT, 0); + ins->offset, 0); BUG_ON(ret); /* logic error */ ret = alloc_reserved_file_extent(trans, root, 0, root_objectid, 0, owner, offset, ins, 1); @@ -8171,7 +8153,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, if (IS_ERR(block_rsv)) return ERR_CAST(block_rsv); - ret = btrfs_reserve_extent(root, blocksize, blocksize, + ret = btrfs_reserve_extent(root, blocksize, blocksize, blocksize, empty_size, hint, &ins, 0, 0); if (ret) goto out_unuse; diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 2234e88..b4d9258 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode, alloc_start = round_down(offset, blocksize); alloc_end = round_up(offset + len, blocksize); + cur_offset = alloc_start; /* Make sure we aren't being give some crap mode */ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) @@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode, /* First, check if we exceed the qgroup limit */ INIT_LIST_HEAD(&reserve_list); - cur_offset = alloc_start; while (1) { em = btrfs_get_extent(inode, NULL, 0, cur_offset, alloc_end - cur_offset, 0); @@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode, last_byte - cur_offset); if (ret < 0) break; + } else { + /* + * Do not need to reserve unwritten extent for this + * range, free reserved data space first, otherwise + * it'll result in false ENOSPC error. + */ + btrfs_free_reserved_data_space(inode, cur_offset, + last_byte - cur_offset); } free_extent_map(em); cur_offset = last_byte; @@ -2805,6 +2813,9 @@ static long btrfs_fallocate(struct file *file, int mode, range->start, range->len, 1 << inode->i_blkbits, offset + len, &alloc_hint); + else + btrfs_free_reserved_data_space(inode, range->start, + range->len); list_del(&range->list); kfree(range); } @@ -2839,18 +2850,11 @@ out_unlock: unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end, &cached_state, GFP_KERNEL); out: - /* - * As we waited the extent range, the data_rsv_map must be empty - * in the range, as written data range will be released from it. - * And for prealloacted extent, it will also be released when - * its metadata is written. - * So this is completely used as cleanup. - */ - btrfs_qgroup_free_data(inode, alloc_start, alloc_end - alloc_start); inode_unlock(inode); /* Let go of our reservation. */ - btrfs_free_reserved_data_space(inode, alloc_start, - alloc_end - alloc_start); + if (ret != 0) + btrfs_free_reserved_data_space(inode, alloc_start, + alloc_end - cur_offset); return ret; } diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index 70107f7..e59e7d6 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/inode-map.c @@ -495,10 +495,9 @@ again: ret = btrfs_prealloc_file_range_trans(inode, trans, 0, 0, prealloc, prealloc, prealloc, &alloc_hint); if (ret) { - btrfs_delalloc_release_space(inode, 0, prealloc); + btrfs_delalloc_release_metadata(inode, prealloc); goto out_put; } - btrfs_free_reserved_data_space(inode, 0, prealloc); ret = btrfs_write_out_ino_cache(root, trans, path, inode); out_put: diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4421954..e0cee59 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -564,6 +564,8 @@ cont: PAGE_SET_WRITEBACK | page_error_op | PAGE_END_WRITEBACK); + btrfs_free_reserved_data_space_noquota(inode, start, + end - start + 1); goto free_pages_out; } } @@ -739,7 +741,7 @@ retry: lock_extent(io_tree, async_extent->start, async_extent->start + async_extent->ram_size - 1); - ret = btrfs_reserve_extent(root, + ret = btrfs_reserve_extent(root, async_extent->ram_size, async_extent->compressed_size, async_extent->compressed_size, 0, alloc_hint, &ins, 1, 1); @@ -966,7 +968,8 @@ static noinline int cow_file_range(struct inode *inode, EXTENT_DEFRAG, PAGE_UNLOCK | PAGE_CLEAR_DIRTY | PAGE_SET_WRITEBACK | PAGE_END_WRITEBACK); - + btrfs_free_reserved_data_space_noquota(inode, start, + end - start + 1); *nr_written = *nr_written + (end - start + PAGE_SIZE) / PAGE_SIZE; *page_started = 1; @@ -986,7 +989,7 @@ static noinline int cow_file_range(struct inode *inode, unsigned long op; cur_alloc_size = disk_num_bytes; - ret = btrfs_reserve_extent(root, cur_alloc_size, + ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size, root->sectorsize, 0, alloc_hint, &ins, 1, 1); if (ret < 0) @@ -1485,8 +1488,10 @@ out_check: extent_clear_unlock_delalloc(inode, cur_offset, cur_offset + num_bytes - 1, locked_page, EXTENT_LOCKED | - EXTENT_DELALLOC, PAGE_UNLOCK | - PAGE_SET_PRIVATE2); + EXTENT_DELALLOC | + EXTENT_CLEAR_DATA_RESV, + PAGE_UNLOCK | PAGE_SET_PRIVATE2); + if (!nolock && nocow) btrfs_end_write_no_snapshoting(root); cur_offset = extent_end; @@ -1803,7 +1808,9 @@ static void btrfs_clear_bit_hook(struct inode *inode, return; if (root->root_key.objectid != BTRFS_DATA_RELOC_TREE_OBJECTID - && do_list && !(state->state & EXTENT_NORESERVE)) + && do_list && !(state->state & EXTENT_NORESERVE) + && (*bits & (EXTENT_DO_ACCOUNTING | + EXTENT_CLEAR_DATA_RESV))) btrfs_free_reserved_data_space_noquota(inode, state->start, len); @@ -7214,7 +7221,7 @@ static struct extent_map *btrfs_new_extent_direct(struct inode *inode, int ret; alloc_hint = get_extent_allocation_hint(inode, start, len); - ret = btrfs_reserve_extent(root, len, root->sectorsize, 0, + ret = btrfs_reserve_extent(root, len, len, root->sectorsize, 0, alloc_hint, &ins, 1, 1); if (ret) return ERR_PTR(ret); @@ -7714,6 +7721,13 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, ret = PTR_ERR(em2); goto unlock_err; } + /* + * For inode marked NODATACOW or extent marked PREALLOC, + * use the existing or preallocated extent, so does not + * need to adjust btrfs_space_info's bytes_may_use. + */ + btrfs_free_reserved_data_space_noquota(inode, + start, len); goto unlock; } } @@ -7748,7 +7762,6 @@ unlock: i_size_write(inode, start + len); adjust_dio_outstanding_extents(inode, dio_data, len); - btrfs_free_reserved_data_space(inode, start, len); WARN_ON(dio_data->reserve < len); dio_data->reserve -= len; dio_data->unsubmitted_oe_range_end = start + len; @@ -10269,6 +10282,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode, u64 last_alloc = (u64)-1; int ret = 0; bool own_trans = true; + u64 end = start + num_bytes - 1; if (trans) own_trans = false; @@ -10290,8 +10304,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode, * sized chunks. */ cur_bytes = min(cur_bytes, last_alloc); - ret = btrfs_reserve_extent(root, cur_bytes, min_size, 0, - *alloc_hint, &ins, 1, 0); + ret = btrfs_reserve_extent(root, cur_bytes, cur_bytes, + min_size, 0, *alloc_hint, &ins, 1, 0); if (ret) { if (own_trans) btrfs_end_transaction(trans, root); @@ -10377,6 +10391,9 @@ next: if (own_trans) btrfs_end_transaction(trans, root); } + if (cur_offset < end) + btrfs_free_reserved_data_space(inode, cur_offset, + end - cur_offset + 1); return ret; } diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index a0de885..f39c4db 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3032,6 +3032,7 @@ int prealloc_file_extent_cluster(struct inode *inode, int ret = 0; u64 prealloc_start = cluster->start - offset; u64 prealloc_end = cluster->end - offset; + u64 cur_offset; BUG_ON(cluster->start != cluster->boundary[0]); inode_lock(inode); @@ -3041,6 +3042,7 @@ int prealloc_file_extent_cluster(struct inode *inode, if (ret) goto out; + cur_offset = prealloc_start; while (nr < cluster->nr) { start = cluster->boundary[nr] - offset; if (nr + 1 < cluster->nr) @@ -3050,16 +3052,21 @@ int prealloc_file_extent_cluster(struct inode *inode, lock_extent(&BTRFS_I(inode)->io_tree, start, end); num_bytes = end + 1 - start; + if (cur_offset < start) + btrfs_free_reserved_data_space(inode, cur_offset, + start - cur_offset); ret = btrfs_prealloc_file_range(inode, 0, start, num_bytes, num_bytes, end + 1, &alloc_hint); + cur_offset = end + 1; unlock_extent(&BTRFS_I(inode)->io_tree, start, end); if (ret) break; nr++; } - btrfs_free_reserved_data_space(inode, prealloc_start, - prealloc_end + 1 - prealloc_start); + if (cur_offset < prealloc_end) + btrfs_free_reserved_data_space(inode, cur_offset, + prealloc_end + 1 - cur_offset); out: inode_unlock(inode); return ret;
This patch can fix some false ENOSPC errors, below test script can reproduce one false ENOSPC error: #!/bin/bash dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128 dev=$(losetup --show -f fs.img) mkfs.btrfs -f -M $dev mkdir /tmp/mntpoint mount $dev /tmp/mntpoint cd /tmp/mntpoint xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile Above script will fail for ENOSPC reason, but indeed fs still has free space to satisfy this request. Please see call graph: btrfs_fallocate() |-> btrfs_alloc_data_chunk_ondemand() | bytes_may_use += 64M |-> btrfs_prealloc_file_range() |-> btrfs_reserve_extent() |-> btrfs_add_reserved_bytes() | alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not | change bytes_may_use, and bytes_reserved += 64M. Now | bytes_may_use + bytes_reserved == 128M, which is greater | than btrfs_space_info's total_bytes, false enospc occurs. | Note, the bytes_may_use decrease operation will done in | end of btrfs_fallocate(), which is too late. Here is another simple case for buffered write: CPU 1 | CPU 2 | |-> cow_file_range() |-> __btrfs_buffered_write() |-> btrfs_reserve_extent() | | | | | | | | | ..... | |-> btrfs_check_data_free_space() | | | | |-> extent_clear_unlock_delalloc() | In CPU 1, btrfs_reserve_extent()->find_free_extent()-> btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease operation will be delayed to be done in extent_clear_unlock_delalloc(). Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's btrfs_check_data_free_space() tries to reserve 100MB data space. If 100MB > data_sinfo->total_bytes - data_sinfo->bytes_used - data_sinfo->bytes_reserved - data_sinfo->bytes_pinned - data_sinfo->bytes_readonly - data_sinfo->bytes_may_use btrfs_check_data_free_space() will try to allcate new data chunk or call btrfs_start_delalloc_roots(), or commit current transaction inorder to reserve some free space, obviously a lot of work. But indeed it's not necessary as long as decreasing bytes_may_use timely, we still have free space, decreasing 128M from bytes_may_use. To fix this issue, this patch chooses to update bytes_may_use for both data and metadata in btrfs_add_reserved_bytes(). For compress path, real extent length may not be equal to file content length, so introduce a ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by file content length. Then compress path can update bytes_may_use correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC and RESERVE_FREE. For inode marked as NODATACOW or extent marked as PREALLOC, we can directly call btrfs_free_reserved_data_space() to adjust bytes_may_use. Meanwhile __btrfs_prealloc_file_range() will call btrfs_free_reserved_data_space() internally for both sucessful and failed path, btrfs_prealloc_file_range()'s callers does not need to call btrfs_free_reserved_data_space() any more. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> --- fs/btrfs/ctree.h | 2 +- fs/btrfs/extent-tree.c | 56 +++++++++++++++++--------------------------------- fs/btrfs/file.c | 26 +++++++++++++---------- fs/btrfs/inode-map.c | 3 +-- fs/btrfs/inode.c | 37 ++++++++++++++++++++++++--------- fs/btrfs/relocation.c | 11 ++++++++-- 6 files changed, 72 insertions(+), 63 deletions(-)