diff mbox series

[U-BOOT,2/3] btrfs: btrfs_file_read: allow opportunistic read until the end

Message ID 20230418-btrfs-extent-reads-v1-2-47ba9839f0cc@codewreck.org (mailing list archive)
State New, archived
Headers show
Series btrfs: fix and improve read code | expand

Commit Message

Dominique Martinet April 18, 2023, 1:17 a.m. UTC
From: Dominique Martinet <dominique.martinet@atmark-techno.com>

btrfs_file_read main 'aligned read' loop would limit the last read to
the aligned end even if the data is present in the extent: there is no
reason not to read the data that we know will be presented correctly.

If that somehow fails cur only advances up to the aligned_end anyway and
the file tail is read through the old code unchanged; this could be
required if e.g. the end data is inlined.

Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
---
 fs/btrfs/inode.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Comments

Qu Wenruo April 18, 2023, 2:02 a.m. UTC | #1
On 2023/4/18 09:17, Dominique Martinet wrote:
> From: Dominique Martinet <dominique.martinet@atmark-techno.com>
> 
> btrfs_file_read main 'aligned read' loop would limit the last read to
> the aligned end even if the data is present in the extent: there is no
> reason not to read the data that we know will be presented correctly.
> 
> If that somehow fails cur only advances up to the aligned_end anyway and
> the file tail is read through the old code unchanged; this could be
> required if e.g. the end data is inlined.
> 
> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
> ---
>   fs/btrfs/inode.c | 17 +++++++++--------
>   1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3d6e39e6544d..efffec0f2e68 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -663,7 +663,8 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len,
>   	struct btrfs_path path;
>   	struct btrfs_key key;
>   	u64 aligned_start = round_down(file_offset, fs_info->sectorsize);
> -	u64 aligned_end = round_down(file_offset + len, fs_info->sectorsize);
> +	u64 end = file_offset + len;
> +	u64 aligned_end = round_down(end, fs_info->sectorsize);
>   	u64 next_offset;
>   	u64 cur = aligned_start;
>   	int ret = 0;
> @@ -743,26 +744,26 @@ int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len,
>   		extent_num_bytes = btrfs_file_extent_num_bytes(path.nodes[0],
>   							       fi);
>   		ret = btrfs_read_extent_reg(&path, fi, cur,
> -				min(extent_num_bytes, aligned_end - cur),
> +				min(extent_num_bytes, end - cur),
>   				dest + cur - file_offset);
>   		if (ret < 0)
>   			goto out;
> -		cur += min(extent_num_bytes, aligned_end - cur);
> +		cur += min(extent_num_bytes, end - cur);
>   	}
>   
>   	/* Read the tailing unaligned part*/

Can we remove this part completely?

IIRC if we read until the target end, the unaligned end part can be 
completely removed then.

Thanks,
Qu
> -	if (file_offset + len != aligned_end) {
> +	if (file_offset + len != cur) {
>   		btrfs_release_path(&path);
> -		ret = lookup_data_extent(root, &path, ino, aligned_end,
> +		ret = lookup_data_extent(root, &path, ino, cur,
>   					 &next_offset);
>   		/* <0 is error, >0 means no extent */
>   		if (ret)
>   			goto out;
>   		fi = btrfs_item_ptr(path.nodes[0], path.slots[0],
>   				    struct btrfs_file_extent_item);
> -		ret = read_and_truncate_page(&path, fi, aligned_end,
> -				file_offset + len - aligned_end,
> -				dest + aligned_end - file_offset);
> +		ret = read_and_truncate_page(&path, fi, cur,
> +				file_offset + len - cur,
> +				dest + cur - file_offset);
>   	}
>   out:
>   	btrfs_release_path(&path);
>
Dominique Martinet April 18, 2023, 2:41 a.m. UTC | #2
Qu Wenruo wrote on Tue, Apr 18, 2023 at 10:02:00AM +0800:
> >   	/* Read the tailing unaligned part*/
> 
> Can we remove this part completely?
> 
> IIRC if we read until the target end, the unaligned end part can be
> completely removed then.

The "Read the aligned part" loop stops at aligned_end:
> while (cur < aligned_end)

So it should be possible that the last aligned extent we consider does
not contain data until the end, e.g. an offset that ends with the
aligned end:

0            4096    4123
[extent1-----|extent2]

In this case the main loop will only read extent1 and we need the
"trailing unaligned part" if for extent2.

I have a feeling the loop could just be updated to go to the end
`while (cur < end)` as it doesn't seem to care about the end
alignment... Should I update v2 to do that instead?



This made me look at the "Read out the leading unaligned part" initial
part, and its check only looks at the sector size but it probably has
the same problem -- we want to make sure we read any leftover from a
previous extent e.g. this file:
0              4096       8192
[extent1------------------|extent2]
and reading from 4k with a sectorsize of 4k is probably bad will enter
the aligned main loop right away...
And I think that'll fail?...
Actually not quite sure what's expecting what to be aligned in there,
but I just tried some partial reads from non-zero offsets and my board
resets almost all the time so I guess I've found something else to dig
into.
This isn't a priority for me right now but I'll look a bit more when I
have more time if you haven't first.
Qu Wenruo April 18, 2023, 2:53 a.m. UTC | #3
On 2023/4/18 10:41, Dominique Martinet wrote:
> Qu Wenruo wrote on Tue, Apr 18, 2023 at 10:02:00AM +0800:
>>>    	/* Read the tailing unaligned part*/
>>
>> Can we remove this part completely?
>>
>> IIRC if we read until the target end, the unaligned end part can be
>> completely removed then.
> 
> The "Read the aligned part" loop stops at aligned_end:
>> while (cur < aligned_end)
> 
> So it should be possible that the last aligned extent we consider does
> not contain data until the end, e.g. an offset that ends with the
> aligned end:
> 
> 0            4096    4123
> [extent1-----|extent2]
> 
> In this case the main loop will only read extent1 and we need the
> "trailing unaligned part" if for extent2.
> 
> I have a feeling the loop could just be updated to go to the end
> `while (cur < end)` as it doesn't seem to care about the end
> alignment... Should I update v2 to do that instead?

Yeah, it would be very awesome if you can remove the tailing part 
completely.

> 
> 
> 
> This made me look at the "Read out the leading unaligned part" initial
> part, and its check only looks at the sector size but it probably has
> the same problem -- we want to make sure we read any leftover from a
> previous extent e.g. this file:

If you're talking about the same problem mentioned in patch 1, then yes, 
any read in the middle of an extent would cause problems.

No matter if it's aligned or not, as btrfs would convert any unaligned 
part into aligned read.

But if we fix the bug you mentioned, I see nothing can going wrong.

Unaligned read is converted into aligned one (then copy the target range 
into the dest buf), and aligned part (including the tailing unaligned 
part) would be properly handled then.

> 0              4096       8192
> [extent1------------------|extent2]
> and reading from 4k with a sectorsize of 4k is probably bad will enter
> the aligned main loop right away...
> And I think that'll fail?...
> Actually not quite sure what's expecting what to be aligned in there,
> but I just tried some partial reads from non-zero offsets and my board
> resets almost all the time so I guess I've found something else to dig
> into.
> This isn't a priority for me right now but I'll look a bit more when I
> have more time if you haven't first.
> 
My initial plan is to merge the tailing part into the aligned loop, but 
since you're already working in this part, feel free to do it.

And I'm very happy to provide any help on this.

Thanks,
Qu
Dominique Martinet April 18, 2023, 3:07 a.m. UTC | #4
Qu Wenruo wrote on Tue, Apr 18, 2023 at 10:53:41AM +0800:
> > I have a feeling the loop could just be updated to go to the end
> > `while (cur < end)` as it doesn't seem to care about the end
> > alignment... Should I update v2 to do that instead?
> 
> Yeah, it would be very awesome if you can remove the tailing part
> completely.

Ok, will give it a try.
I'll want to test this a bit so might take a day or two as I have other
work to finish first.

> > This made me look at the "Read out the leading unaligned part" initial
> > part, and its check only looks at the sector size but it probably has
> > the same problem -- we want to make sure we read any leftover from a
> > previous extent e.g. this file:
> 
> If you're talking about the same problem mentioned in patch 1, then yes, any
> read in the middle of an extent would cause problems.

No, was just thinking the leading part being a separate loop doesn't
seem to make sense either as the code shouldn't care about sector size
alignemnt but about full extents.
If the main loop handles everything correctly then the leading if can
also be removed along with "read_and_truncate_page" that would no longer
be used.

I'll give this a try as well and report back.

> No matter if it's aligned or not, as btrfs would convert any unaligned part
> into aligned read.

Yes I don't see where it would fail (except that my board does crash),
I guess that at this point I should spend some time building a qemu
u-boot and hooking up gdb will be faster..

> My initial plan is to merge the tailing part into the aligned loop, but
> since you're already working in this part, feel free to do it.

Yes, sure -- removing the if is easy, I'd just rather not make it fail
for someone else :)
Qu Wenruo April 18, 2023, 3:21 a.m. UTC | #5
On 2023/4/18 11:07, Dominique Martinet wrote:
> Qu Wenruo wrote on Tue, Apr 18, 2023 at 10:53:41AM +0800:
>>> I have a feeling the loop could just be updated to go to the end
>>> `while (cur < end)` as it doesn't seem to care about the end
>>> alignment... Should I update v2 to do that instead?
>>
>> Yeah, it would be very awesome if you can remove the tailing part
>> completely.
> 
> Ok, will give it a try.
> I'll want to test this a bit so might take a day or two as I have other
> work to finish first.
> 
>>> This made me look at the "Read out the leading unaligned part" initial
>>> part, and its check only looks at the sector size but it probably has
>>> the same problem -- we want to make sure we read any leftover from a
>>> previous extent e.g. this file:
>>
>> If you're talking about the same problem mentioned in patch 1, then yes, any
>> read in the middle of an extent would cause problems.
> 
> No, was just thinking the leading part being a separate loop doesn't
> seem to make sense either as the code shouldn't care about sector size
> alignemnt but about full extents.

The main concern related to the leading unaligned part is, we need to 
skip something unaligned from the beginning , while all other situations 
never need to skip such case (they at most skip the tailing part).

Maybe we can do some extra calculation to handle it, but I'm not in a 
hurry to address the leading part.

Another thing is, for all other fs-ish interface (kernel/fuse etc), they 
all read/write in certain block size, then handle unaligned part from a 
higher layer.
Thus I prefer to have most things handled in an aligned fashion, and 
only handle the unaligned part specially.

But if you can find an elegant way to handle all cases, that would be 
really awesome!

Thanks,
Qu

> If the main loop handles everything correctly then the leading if can
> also be removed along with "read_and_truncate_page" that would no longer
> be used.
> 
> I'll give this a try as well and report back.
> 
>> No matter if it's aligned or not, as btrfs would convert any unaligned part
>> into aligned read.
> 
> Yes I don't see where it would fail (except that my board does crash),
> I guess that at this point I should spend some time building a qemu
> u-boot and hooking up gdb will be faster..
> 
>> My initial plan is to merge the tailing part into the aligned loop, but
>> since you're already working in this part, feel free to do it.
> 
> Yes, sure -- removing the if is easy, I'd just rather not make it fail
> for someone else :)
>
Dominique Martinet April 18, 2023, 3:53 a.m. UTC | #6
Qu Wenruo wrote on Tue, Apr 18, 2023 at 11:21:00AM +0800:
> > No, was just thinking the leading part being a separate loop doesn't
> > seem to make sense either as the code shouldn't care about sector size
> > alignemnt but about full extents.
> 
> The main concern related to the leading unaligned part is, we need to skip
> something unaligned from the beginning , while all other situations never
> need to skip such case (they at most skip the tailing part).

Ok, there is one exception for inline extents apparently.. But I'm not
still not convinced the `aligned_start != file_offset` check is enough
for that either; I'd say it's unlikely but the inline part can be
compressed, so we could have a file which has > 4k (sectorsize) of
expanded data, so a read from the 4k offset would skip the special
handling and fail (reading the whole extent in dest)

Even if that's not possible, reading just the first 10 bytes of an
inline extent will be aligned and go through the main loop which just
reads the whole extent, so it'll need the same handling as the regular
btrfs_read_extent_reg handling at which point it might just as well
handle start offset too.


That aside taking the loop in order:
- lookup_data_extent doesn't care (used in heading/tail)
- skipping holes don't care as they explicitely put cursor at start of
next extent (or bail out if nothing next)
- inline needs fixing anyway as said above
- PREALLOC or nothing on disk also goes straight to next and is ok
- ah, I see what you meant now, we need to substract the current
position within the extent to extent_num_bytes...
That's also already a problem, though; to take the same example:
0                 8k           16k
[extent1          | extent2 ... ]
reading from 4k onwards will try to read
min(extent_num_bytes, end-cur) = min(8k, 12k) = 8k
from the 4k offset which goes over the end of the extent.

That could actually be my resets from the previous mail.


So either the first check should just lookup the extent and check that
extent start matches current offset instead of checking for sectorsize
alignment, or we can just fix the loop and remove the first if.
Dominique Martinet April 18, 2023, 5:17 a.m. UTC | #7
Dominique Martinet wrote on Tue, Apr 18, 2023 at 12:53:35PM +0900:
> Ok, there is one exception for inline extents apparently.. But I'm not
> still not convinced the `aligned_start != file_offset` check is enough
> for that either; I'd say it's unlikely but the inline part can be
> compressed, so we could have a file which has > 4k (sectorsize) of
> expanded data, so a read from the 4k offset would skip the special
> handling and fail (reading the whole extent in dest)

(Wasn't able to create such a file, I assume that means the uncompressed
data must fits in a page -- if we deal with arm machines with 16k or 64k
page size that'll probably change things again but I'll just pretend
sector size will magically match in this case..)

> Even if that's not possible, reading just the first 10 bytes of an
> inline extent will be aligned and go through the main loop which just
> reads the whole extent, so it'll need the same handling as the regular
> btrfs_read_extent_reg handling at which point it might just as well
> handle start offset too.

Question regarding inline extent: the main loop goes straight out after
reading an inline extent, assuming it is always alone (or last) -- is
that correct?
By playing with `filefrag -v` I could sometimes see files that
temporarily have inline + a delalloc extent, but wasn't able to make a
file that kept the inline extent after appending more data, so I guess
it is also sane enough... Just making it continue and letting the loop
end seems just as simple now there is no trailing if, but if inline
extents cannot be mixed I'll be happy to keep it going out.


back to btrfs_read_extent_reg:
merging the tail back into the main loop breaks the first assert
(IS_ALIGNED(len, fs_info->sectorsize) in particular).
The old loop invocation made sure it was aligned and
read_and_truncate_page used to take care of calling it with a bigger
buffer when it was not.

I was only looking at the compressed path and that does not care about
'len' alignment because it makes an intermediate copy for decompression,
but the BTRFS_COMPRESS_NONE's read_extent_data might care?
I didn't see anything that actually requires alignment there (len in
particular should be ok, but even offset->logical seems to properly be
used as "being part of a range" in lookups so alignment doesn't actually
matter), but if this isn't tested I can understand wanting to be more
careful there.

Ok so this is rightfully less obvious than I had first assumed -- sorry
for rushing in. Let's do baby steps:
- I'll resend just the first patch shortly, it's a real fix and I'll be
using it right away.
- I'll double check 'len' doesn't need to be aligned in
btrfs_read_extent_reg and just rework the main loop to allow removing
the tail exception, that'll avoid a double read in many cases and I
think it's worth doing.
Might take a bit more time as I want to finish some other work, let's
say a few days.
- That leaves the two issues I brought up with the main loop:
 * inline case ignoring end; this is minor enough but easy to fix
 * btrfs_read_extent_reg() assuming start of extent in its length
 agument; I believe it'll be easier to stop trying to set a upper limit
 in the main loop and just have btrfs_read_extent_reg() do it itself, we
 can use its return value to know how much was actually read.
 (Probably just substract offset - key.offset to extent_num_bytes to
 have a new cap, but I still don't understand btrfs_file_extent_offset()
 in all of this as it's always 0 when I looked)
Will try to do that over the next few weeks, but if you want to look at
it feel free to do this before me.
- At this point it might be worth considering removing the initial check
as that also makes an extra small read in the unaligned case, but it's
not a bug and can wait.


What do you think?
Qu Wenruo April 18, 2023, 7:15 a.m. UTC | #8
On 2023/4/18 11:53, Dominique Martinet wrote:
> Qu Wenruo wrote on Tue, Apr 18, 2023 at 11:21:00AM +0800:
>>> No, was just thinking the leading part being a separate loop doesn't
>>> seem to make sense either as the code shouldn't care about sector size
>>> alignemnt but about full extents.
>>
>> The main concern related to the leading unaligned part is, we need to skip
>> something unaligned from the beginning , while all other situations never
>> need to skip such case (they at most skip the tailing part).
> 
> Ok, there is one exception for inline extents apparently.. But I'm not
> still not convinced the `aligned_start != file_offset` check is enough
> for that either; I'd say it's unlikely but the inline part can be
> compressed, so we could have a file which has > 4k (sectorsize) of
> expanded data, so a read from the 4k offset would skip the special
> handling and fail (reading the whole extent in dest)

Btrfs inline has a limit to sectorsize.

That means, inlined compressed extent can at most be 4K sized (if 4K is 
our sector size).

So that won't be a problem.

> 
> Even if that's not possible, reading just the first 10 bytes of an
> inline extent will be aligned and go through the main loop which just
> reads the whole extent, so it'll need the same handling as the regular
> btrfs_read_extent_reg handling at which point it might just as well
> handle start offset too.

If we just read 10 bytes, the aligned part should handle it well.

My real concern is what if we read 10 bytes at offset 10 bytes.

If this can be handled in the same way of aligned read (and still be 
reasonable readable), then it would be awesome.

> 
> 
> That aside taking the loop in order:
> - lookup_data_extent doesn't care (used in heading/tail)
> - skipping holes don't care as they explicitely put cursor at start of
> next extent (or bail out if nothing next)
> - inline needs fixing anyway as said above
> - PREALLOC or nothing on disk also goes straight to next and is ok
> - ah, I see what you meant now, we need to substract the current
> position within the extent to extent_num_bytes...
> That's also already a problem, though; to take the same example:
> 0                 8k           16k
> [extent1          | extent2 ... ]
> reading from 4k onwards will try to read
> min(extent_num_bytes, end-cur) = min(8k, 12k) = 8k
> from the 4k offset which goes over the end of the extent.

That's indeed a problem.

As most of the Uboot fs drivers only care read the whole file, never 
really utilize the ability to read part of the file, that path is not 
properly tested.
(Some driver, IIRC ubifs?, doesn't even allow read with non-zero offset)

Thanks,
Qu

> 
> That could actually be my resets from the previous mail.
> 
> 
> So either the first check should just lookup the extent and check that
> extent start matches current offset instead of checking for sectorsize
> alignment, or we can just fix the loop and remove the first if.
>
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3d6e39e6544d..efffec0f2e68 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -663,7 +663,8 @@  int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len,
 	struct btrfs_path path;
 	struct btrfs_key key;
 	u64 aligned_start = round_down(file_offset, fs_info->sectorsize);
-	u64 aligned_end = round_down(file_offset + len, fs_info->sectorsize);
+	u64 end = file_offset + len;
+	u64 aligned_end = round_down(end, fs_info->sectorsize);
 	u64 next_offset;
 	u64 cur = aligned_start;
 	int ret = 0;
@@ -743,26 +744,26 @@  int btrfs_file_read(struct btrfs_root *root, u64 ino, u64 file_offset, u64 len,
 		extent_num_bytes = btrfs_file_extent_num_bytes(path.nodes[0],
 							       fi);
 		ret = btrfs_read_extent_reg(&path, fi, cur,
-				min(extent_num_bytes, aligned_end - cur),
+				min(extent_num_bytes, end - cur),
 				dest + cur - file_offset);
 		if (ret < 0)
 			goto out;
-		cur += min(extent_num_bytes, aligned_end - cur);
+		cur += min(extent_num_bytes, end - cur);
 	}
 
 	/* Read the tailing unaligned part*/
-	if (file_offset + len != aligned_end) {
+	if (file_offset + len != cur) {
 		btrfs_release_path(&path);
-		ret = lookup_data_extent(root, &path, ino, aligned_end,
+		ret = lookup_data_extent(root, &path, ino, cur,
 					 &next_offset);
 		/* <0 is error, >0 means no extent */
 		if (ret)
 			goto out;
 		fi = btrfs_item_ptr(path.nodes[0], path.slots[0],
 				    struct btrfs_file_extent_item);
-		ret = read_and_truncate_page(&path, fi, aligned_end,
-				file_offset + len - aligned_end,
-				dest + aligned_end - file_offset);
+		ret = read_and_truncate_page(&path, fi, cur,
+				file_offset + len - cur,
+				dest + cur - file_offset);
 	}
 out:
 	btrfs_release_path(&path);