diff mbox series

generic/286: fix integer underflow on block sizes != 4096

Message ID 20210522184814.95802-1-jakobunt@gmail.com (mailing list archive)
State New, archived
Headers show
Series generic/286: fix integer underflow on block sizes != 4096 | expand

Commit Message

Jakob Unterwurzacher May 22, 2021, 6:48 p.m. UTC
The read loop always requested 4096 bytes, which only works
when the total read length is a multiple of 4096 bytes.

This is not neccessarily true, and when it's not, len wraps
around to UINT64_MAX and you get a lot of these:

	ERROR: [error:38] reached EOF:Success

This was caught when running xfstests against gocryptfs,
an encrypted overlay file system.

On ext4, the test still passes after this change.

Signed-off-by: Jakob Unterwurzacher <jakobunt@gmail.com>
---
 src/seek_copy_test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eryu Guan May 23, 2021, 9:05 a.m. UTC | #1
On Sat, May 22, 2021 at 08:48:14PM +0200, Jakob Unterwurzacher wrote:
> The read loop always requested 4096 bytes, which only works
> when the total read length is a multiple of 4096 bytes.

The total read length should be

"The length of this extent is (hole_off - data_off)"

according to the comments above do_extent_copy(). Total read length
being not a multiple of 4k means 'data_off' or 'hole_off' is not 4k
aligned.

> 
> This is not neccessarily true, and when it's not, len wraps

But generic/286 creates source files with lenght of all data extents and
hole extents being multiple of 4k. So I still don't understand why this
is valid for gocryptfs. Shouldn't that be a bug in seek_data/seek_hole
in gocryptfs? Could you please elaborate?

Thanks,
Eryu

> around to UINT64_MAX and you get a lot of these:
> 
> 	ERROR: [error:38] reached EOF:Success
> 
> This was caught when running xfstests against gocryptfs,
> an encrypted overlay file system.
> 
> On ext4, the test still passes after this change.
> 
> Signed-off-by: Jakob Unterwurzacher <jakobunt@gmail.com>
> ---
>  src/seek_copy_test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/seek_copy_test.c b/src/seek_copy_test.c
> index 0c2c6a3d..28c021e2 100644
> --- a/src/seek_copy_test.c
> +++ b/src/seek_copy_test.c
> @@ -98,7 +98,7 @@ do_extent_copy(int src_fd, int dest_fd, off_t data_off, off_t hole_off)
>  	}
>  
>  	while (len > 0) {
> -		ssize_t nr_read = read(src_fd, buf, BUF_SIZE);
> +		ssize_t nr_read = read(src_fd, buf, MIN(len, BUF_SIZE));
>  		if (nr_read < 0) {
>  			if (errno == EINTR)
>  				continue;
> -- 
> 2.31.1
Jakob Unterwurzacher May 25, 2021, 5:34 p.m. UTC | #2
On Sun, May 23, 2021 at 11:05 AM Eryu Guan <guan@eryu.me> wrote:
> The total read length should be
>
> "The length of this extent is (hole_off - data_off)"
>
> according to the comments above do_extent_copy(). Total read length
> being not a multiple of 4k means 'data_off' or 'hole_off' is not 4k
> aligned.

That is correct.

> But generic/286 creates source files with length of all data extents and
> hole extents being multiple of 4k. So I still don't understand why this
> is valid for gocryptfs. Shouldn't that be a bug in seek_data/seek_hole
> in gocryptfs? Could you please elaborate?

Yes sure, the situation is a bit complicated. gocryptfs works similar
to eCryptFS and EncFS (also overlay filesystems).
The files are stored in encrypted form in regular files on ext4 or xfs
or whatever "real disk" filesystem.
Disk space allocation & file holes are handled by the real filesystem.
A gocryptfs mount shows a decrypted view of these files.

Now, gocryptfs uses AES-GCM for encryption. This adds 32 bytes of
overhead to every 4096-byte block,
which gives a storage size of 4128 bytes.

The encryption overhead is why the files & holes created by
generic/286 are not 4k-aligned on disk when viewed through the
gocryptfs mount.

Thanks, Jakob
Eryu Guan May 26, 2021, 3:20 a.m. UTC | #3
On Tue, May 25, 2021 at 07:34:14PM +0200, Jakob Unterwurzacher wrote:
> On Sun, May 23, 2021 at 11:05 AM Eryu Guan <guan@eryu.me> wrote:
> > The total read length should be
> >
> > "The length of this extent is (hole_off - data_off)"
> >
> > according to the comments above do_extent_copy(). Total read length
> > being not a multiple of 4k means 'data_off' or 'hole_off' is not 4k
> > aligned.
> 
> That is correct.
> 
> > But generic/286 creates source files with length of all data extents and
> > hole extents being multiple of 4k. So I still don't understand why this
> > is valid for gocryptfs. Shouldn't that be a bug in seek_data/seek_hole
> > in gocryptfs? Could you please elaborate?
> 
> Yes sure, the situation is a bit complicated. gocryptfs works similar
> to eCryptFS and EncFS (also overlay filesystems).
> The files are stored in encrypted form in regular files on ext4 or xfs
> or whatever "real disk" filesystem.
> Disk space allocation & file holes are handled by the real filesystem.
> A gocryptfs mount shows a decrypted view of these files.
> 
> Now, gocryptfs uses AES-GCM for encryption. This adds 32 bytes of
> overhead to every 4096-byte block,
> which gives a storage size of 4128 bytes.

Ah, that makes sense to me now. Would you please include the detailed
explaination in commit log as well?

Thanks,
Eryu

> 
> The encryption overhead is why the files & holes created by
> generic/286 are not 4k-aligned on disk when viewed through the
> gocryptfs mount.
> 
> Thanks, Jakob
Darrick J. Wong May 26, 2021, 3:41 a.m. UTC | #4
On Wed, May 26, 2021 at 11:20:37AM +0800, Eryu Guan wrote:
> On Tue, May 25, 2021 at 07:34:14PM +0200, Jakob Unterwurzacher wrote:
> > On Sun, May 23, 2021 at 11:05 AM Eryu Guan <guan@eryu.me> wrote:
> > > The total read length should be
> > >
> > > "The length of this extent is (hole_off - data_off)"
> > >
> > > according to the comments above do_extent_copy(). Total read length
> > > being not a multiple of 4k means 'data_off' or 'hole_off' is not 4k
> > > aligned.
> > 
> > That is correct.
> > 
> > > But generic/286 creates source files with length of all data extents and
> > > hole extents being multiple of 4k. So I still don't understand why this
> > > is valid for gocryptfs. Shouldn't that be a bug in seek_data/seek_hole
> > > in gocryptfs? Could you please elaborate?
> > 
> > Yes sure, the situation is a bit complicated. gocryptfs works similar
> > to eCryptFS and EncFS (also overlay filesystems).
> > The files are stored in encrypted form in regular files on ext4 or xfs
> > or whatever "real disk" filesystem.
> > Disk space allocation & file holes are handled by the real filesystem.
> > A gocryptfs mount shows a decrypted view of these files.
> > 
> > Now, gocryptfs uses AES-GCM for encryption. This adds 32 bytes of
> > overhead to every 4096-byte block,
> > which gives a storage size of 4128 bytes.
> 
> Ah, that makes sense to me now. Would you please include the detailed
> explaination in commit log as well?

...and maybe a sample output of a seek_data/seek_hole scan between a
gocryptfs file and the ext4fs underneath it?  I'm still trying to wrap
my head around what the problem here is.

It might also help to describe where the 32 bytes of overhead goes --
are you interleaving the overhead inline with 4k of encrypted content?

--D

> 
> Thanks,
> Eryu
> 
> > 
> > The encryption overhead is why the files & holes created by
> > generic/286 are not 4k-aligned on disk when viewed through the
> > gocryptfs mount.
> > 
> > Thanks, Jakob
Jakob Unterwurzacher May 26, 2021, 8:02 a.m. UTC | #5
> > Ah, that makes sense to me now. Would you please include the detailed
> > explaination in commit log as well?
>
> ...and maybe a sample output of a seek_data/seek_hole scan between a
> gocryptfs file and the ext4fs underneath it?  I'm still trying to wrap
> my head around what the problem here is.
>
> It might also help to describe where the 32 bytes of overhead goes --
> are you interleaving the overhead inline with 4k of encrypted content?

Yes, it's all inline, the file format is like this [2]:

file header  ... 18 bytes
data block 1 ... 16 bytes block header  (IV)
                 1-4096 bytes user data
                 16 bytes block trailer (MAC)
[more data blocks...]

I am attaching the SEEK_DATA/SEEK_HOLE trace below [1]. One seek goes like this:
(1) gocryptfs gets a seek() call via FUSE
(2) translate plaintext to ciphertext offset
(3) call ext4 seek
(4) translate back to plaintext offset and return

Actually Eryu's first comment had me start thinking about this, and
what gocryptfs
does is a little stupid, because the offsets that get returned to userspace may
point in the middle of a block, but gocryptfs reads or writes in
blocks of 4128 bytes (= 4096 bytes of user data)
(*except at EOF), so it may as well round up at seek already.

This would also mean that the offsets returned to userspace get
aligned to 4096 bytes,
and generic/286 would just work as is. Another filesystem not aligning
to 4096 bytes may hit
the underflow, but it won't be gocryptfs :)

Thanks, Jakob

[1]:
gocryptfs seek(        0, SEEK_DATA) -> translate -> ext4 seek(
0, SEEK_DATA) =         0 -> translate -> return         0
gocryptfs seek(        0, SEEK_HOLE) -> translate -> ext4 seek(
0, SEEK_HOLE) =   1060864 -> translate -> return   1052622
gocryptfs seek(  1052622, SEEK_DATA) -> translate -> ext4 seek(
1060864, SEEK_DATA) =   5283840 -> translate -> return   5242862
gocryptfs seek(  5242862, SEEK_HOLE) -> translate -> ext4 seek(
5283840, SEEK_HOLE) =   6344704 -> translate -> return   6295502
gocryptfs seek(  6295502, SEEK_DATA) -> translate -> ext4 seek(
6344704, SEEK_DATA) =  10567680 -> translate -> return  10485742
gocryptfs seek( 10485742, SEEK_HOLE) -> translate -> ext4 seek(
10567680, SEEK_HOLE) =  11628544 -> translate -> return  11538382
gocryptfs seek( 11538382, SEEK_DATA) -> translate -> ext4 seek(
11628544, SEEK_DATA) =  15851520 -> translate -> return  15728622
gocryptfs seek( 15728622, SEEK_HOLE) -> translate -> ext4 seek(
15851520, SEEK_HOLE) =  16912384 -> translate -> return  16781262
gocryptfs seek( 16781262, SEEK_DATA) -> translate -> ext4 seek(
16912384, SEEK_DATA) =  21135360 -> translate -> return  20971502
gocryptfs seek( 20971502, SEEK_HOLE) -> translate -> ext4 seek(
21135360, SEEK_HOLE) =  22196224 -> translate -> return  22024142
gocryptfs seek( 22024142, SEEK_DATA) -> translate -> ext4 seek(
22196224, SEEK_DATA) =  26419200 -> translate -> return  26214382
gocryptfs seek( 26214382, SEEK_HOLE) -> translate -> ext4 seek(
26419200, SEEK_HOLE) =  27480064 -> translate -> return  27267022
gocryptfs seek( 27267022, SEEK_DATA) -> translate -> ext4 seek(
27480064, SEEK_DATA) =  31703040 -> translate -> return  31457262
gocryptfs seek( 31457262, SEEK_HOLE) -> translate -> ext4 seek(
31703040, SEEK_HOLE) =  32763904 -> translate -> return  32509902
gocryptfs seek( 32509902, SEEK_DATA) -> translate -> ext4 seek(
32763904, SEEK_DATA) =  36986880 -> translate -> return  36700142
gocryptfs seek( 36700142, SEEK_HOLE) -> translate -> ext4 seek(
36986880, SEEK_HOLE) =  38047744 -> translate -> return  37752782
gocryptfs seek( 37752782, SEEK_DATA) -> translate -> ext4 seek(
38047744, SEEK_DATA) =  42270720 -> translate -> return  41943022
gocryptfs seek( 41943022, SEEK_HOLE) -> translate -> ext4 seek(
42270720, SEEK_HOLE) =  43331584 -> translate -> return  42995662
gocryptfs seek( 42995662, SEEK_DATA) -> translate -> ext4 seek(
43331584, SEEK_DATA) =  47554560 -> translate -> return  47185902
gocryptfs seek( 47185902, SEEK_HOLE) -> translate -> ext4 seek(
47554560, SEEK_HOLE) =  48615424 -> translate -> return  48238542
gocryptfs seek( 48238542, SEEK_DATA) -> translate -> ext4 seek(
48615424, SEEK_DATA) =  52838400 -> translate -> return  52428782
gocryptfs seek( 52428782, SEEK_HOLE) -> translate -> ext4 seek(
52838400, SEEK_HOLE) =  53899264 -> translate -> return  53481422
gocryptfs seek( 53481422, SEEK_DATA) -> translate -> ext4 seek(
53899264, SEEK_DATA) =  58122240 -> translate -> return  57671662
gocryptfs seek( 57671662, SEEK_HOLE) -> translate -> ext4 seek(
58122240, SEEK_HOLE) =  59183104 -> translate -> return  58724302
gocryptfs seek( 58724302, SEEK_DATA) -> translate -> ext4 seek(
59183104, SEEK_DATA) =  63406080 -> translate -> return  62914542
gocryptfs seek( 62914542, SEEK_HOLE) -> translate -> ext4 seek(
63406080, SEEK_HOLE) =  64466944 -> translate -> return  63967182
gocryptfs seek( 63967182, SEEK_DATA) -> translate -> ext4 seek(
64466944, SEEK_DATA) =  68689920 -> translate -> return  68157422
gocryptfs seek( 68157422, SEEK_HOLE) -> translate -> ext4 seek(
68689920, SEEK_HOLE) =  69750784 -> translate -> return  69210062
gocryptfs seek( 69210062, SEEK_DATA) -> translate -> ext4 seek(
69750784, SEEK_DATA) =  73973760 -> translate -> return  73400302
gocryptfs seek( 73400302, SEEK_HOLE) -> translate -> ext4 seek(
73973760, SEEK_HOLE) =  75034624 -> translate -> return  74452942
gocryptfs seek( 74452942, SEEK_DATA) -> translate -> ext4 seek(
75034624, SEEK_DATA) =  79257600 -> translate -> return  78643182
gocryptfs seek( 78643182, SEEK_HOLE) -> translate -> ext4 seek(
79257600, SEEK_HOLE) =  80318464 -> translate -> return  79695822
gocryptfs seek( 79695822, SEEK_DATA) -> translate -> ext4 seek(
80318464, SEEK_DATA) =  84541440 -> translate -> return  83886062
gocryptfs seek( 83886062, SEEK_HOLE) -> translate -> ext4 seek(
84541440, SEEK_HOLE) =  85602304 -> translate -> return  84938702
gocryptfs seek( 84938702, SEEK_DATA) -> translate -> ext4 seek(
85602304, SEEK_DATA) =  89825280 -> translate -> return  89128942
gocryptfs seek( 89128942, SEEK_HOLE) -> translate -> ext4 seek(
89825280, SEEK_HOLE) =  90886144 -> translate -> return  90181582
gocryptfs seek( 90181582, SEEK_DATA) -> translate -> ext4 seek(
90886144, SEEK_DATA) =  95109120 -> translate -> return  94371822
gocryptfs seek( 94371822, SEEK_HOLE) -> translate -> ext4 seek(
95109120, SEEK_HOLE) =  96169984 -> translate -> return  95424462
gocryptfs seek( 95424462, SEEK_DATA) -> translate -> ext4 seek(
96169984, SEEK_DATA) = 100392960 -> translate -> return  99614702
gocryptfs seek( 99614702, SEEK_HOLE) -> translate -> ext4
seek(100392960, SEEK_HOLE) = 101453824 -> translate -> return
100667342
gocryptfs seek(100667342, SEEK_DATA) -> translate -> ext4
seek(101453824, SEEK_DATA) = 105676800 -> translate -> return
104857582
gocryptfs seek(104857582, SEEK_HOLE) -> translate -> ext4
seek(105676800, SEEK_HOLE) = 106733586 -> translate -> return
105906176

[2]: https://github.com/rfjakob/gocryptfs/blob/master/Documentation/file-format.md
diff mbox series

Patch

diff --git a/src/seek_copy_test.c b/src/seek_copy_test.c
index 0c2c6a3d..28c021e2 100644
--- a/src/seek_copy_test.c
+++ b/src/seek_copy_test.c
@@ -98,7 +98,7 @@  do_extent_copy(int src_fd, int dest_fd, off_t data_off, off_t hole_off)
 	}
 
 	while (len > 0) {
-		ssize_t nr_read = read(src_fd, buf, BUF_SIZE);
+		ssize_t nr_read = read(src_fd, buf, MIN(len, BUF_SIZE));
 		if (nr_read < 0) {
 			if (errno == EINTR)
 				continue;