Message ID | 20210522184814.95802-1-jakobunt@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | generic/286: fix integer underflow on block sizes != 4096 | expand |
On Sat, May 22, 2021 at 08:48:14PM +0200, Jakob Unterwurzacher wrote: > The read loop always requested 4096 bytes, which only works > when the total read length is a multiple of 4096 bytes. The total read length should be "The length of this extent is (hole_off - data_off)" according to the comments above do_extent_copy(). Total read length being not a multiple of 4k means 'data_off' or 'hole_off' is not 4k aligned. > > This is not neccessarily true, and when it's not, len wraps But generic/286 creates source files with lenght of all data extents and hole extents being multiple of 4k. So I still don't understand why this is valid for gocryptfs. Shouldn't that be a bug in seek_data/seek_hole in gocryptfs? Could you please elaborate? Thanks, Eryu > around to UINT64_MAX and you get a lot of these: > > ERROR: [error:38] reached EOF:Success > > This was caught when running xfstests against gocryptfs, > an encrypted overlay file system. > > On ext4, the test still passes after this change. > > Signed-off-by: Jakob Unterwurzacher <jakobunt@gmail.com> > --- > src/seek_copy_test.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/seek_copy_test.c b/src/seek_copy_test.c > index 0c2c6a3d..28c021e2 100644 > --- a/src/seek_copy_test.c > +++ b/src/seek_copy_test.c > @@ -98,7 +98,7 @@ do_extent_copy(int src_fd, int dest_fd, off_t data_off, off_t hole_off) > } > > while (len > 0) { > - ssize_t nr_read = read(src_fd, buf, BUF_SIZE); > + ssize_t nr_read = read(src_fd, buf, MIN(len, BUF_SIZE)); > if (nr_read < 0) { > if (errno == EINTR) > continue; > -- > 2.31.1
On Sun, May 23, 2021 at 11:05 AM Eryu Guan <guan@eryu.me> wrote: > The total read length should be > > "The length of this extent is (hole_off - data_off)" > > according to the comments above do_extent_copy(). Total read length > being not a multiple of 4k means 'data_off' or 'hole_off' is not 4k > aligned. That is correct. > But generic/286 creates source files with length of all data extents and > hole extents being multiple of 4k. So I still don't understand why this > is valid for gocryptfs. Shouldn't that be a bug in seek_data/seek_hole > in gocryptfs? Could you please elaborate? Yes sure, the situation is a bit complicated. gocryptfs works similar to eCryptFS and EncFS (also overlay filesystems). The files are stored in encrypted form in regular files on ext4 or xfs or whatever "real disk" filesystem. Disk space allocation & file holes are handled by the real filesystem. A gocryptfs mount shows a decrypted view of these files. Now, gocryptfs uses AES-GCM for encryption. This adds 32 bytes of overhead to every 4096-byte block, which gives a storage size of 4128 bytes. The encryption overhead is why the files & holes created by generic/286 are not 4k-aligned on disk when viewed through the gocryptfs mount. Thanks, Jakob
On Tue, May 25, 2021 at 07:34:14PM +0200, Jakob Unterwurzacher wrote: > On Sun, May 23, 2021 at 11:05 AM Eryu Guan <guan@eryu.me> wrote: > > The total read length should be > > > > "The length of this extent is (hole_off - data_off)" > > > > according to the comments above do_extent_copy(). Total read length > > being not a multiple of 4k means 'data_off' or 'hole_off' is not 4k > > aligned. > > That is correct. > > > But generic/286 creates source files with length of all data extents and > > hole extents being multiple of 4k. So I still don't understand why this > > is valid for gocryptfs. Shouldn't that be a bug in seek_data/seek_hole > > in gocryptfs? Could you please elaborate? > > Yes sure, the situation is a bit complicated. gocryptfs works similar > to eCryptFS and EncFS (also overlay filesystems). > The files are stored in encrypted form in regular files on ext4 or xfs > or whatever "real disk" filesystem. > Disk space allocation & file holes are handled by the real filesystem. > A gocryptfs mount shows a decrypted view of these files. > > Now, gocryptfs uses AES-GCM for encryption. This adds 32 bytes of > overhead to every 4096-byte block, > which gives a storage size of 4128 bytes. Ah, that makes sense to me now. Would you please include the detailed explaination in commit log as well? Thanks, Eryu > > The encryption overhead is why the files & holes created by > generic/286 are not 4k-aligned on disk when viewed through the > gocryptfs mount. > > Thanks, Jakob
On Wed, May 26, 2021 at 11:20:37AM +0800, Eryu Guan wrote: > On Tue, May 25, 2021 at 07:34:14PM +0200, Jakob Unterwurzacher wrote: > > On Sun, May 23, 2021 at 11:05 AM Eryu Guan <guan@eryu.me> wrote: > > > The total read length should be > > > > > > "The length of this extent is (hole_off - data_off)" > > > > > > according to the comments above do_extent_copy(). Total read length > > > being not a multiple of 4k means 'data_off' or 'hole_off' is not 4k > > > aligned. > > > > That is correct. > > > > > But generic/286 creates source files with length of all data extents and > > > hole extents being multiple of 4k. So I still don't understand why this > > > is valid for gocryptfs. Shouldn't that be a bug in seek_data/seek_hole > > > in gocryptfs? Could you please elaborate? > > > > Yes sure, the situation is a bit complicated. gocryptfs works similar > > to eCryptFS and EncFS (also overlay filesystems). > > The files are stored in encrypted form in regular files on ext4 or xfs > > or whatever "real disk" filesystem. > > Disk space allocation & file holes are handled by the real filesystem. > > A gocryptfs mount shows a decrypted view of these files. > > > > Now, gocryptfs uses AES-GCM for encryption. This adds 32 bytes of > > overhead to every 4096-byte block, > > which gives a storage size of 4128 bytes. > > Ah, that makes sense to me now. Would you please include the detailed > explaination in commit log as well? ...and maybe a sample output of a seek_data/seek_hole scan between a gocryptfs file and the ext4fs underneath it? I'm still trying to wrap my head around what the problem here is. It might also help to describe where the 32 bytes of overhead goes -- are you interleaving the overhead inline with 4k of encrypted content? --D > > Thanks, > Eryu > > > > > The encryption overhead is why the files & holes created by > > generic/286 are not 4k-aligned on disk when viewed through the > > gocryptfs mount. > > > > Thanks, Jakob
> > Ah, that makes sense to me now. Would you please include the detailed > > explaination in commit log as well? > > ...and maybe a sample output of a seek_data/seek_hole scan between a > gocryptfs file and the ext4fs underneath it? I'm still trying to wrap > my head around what the problem here is. > > It might also help to describe where the 32 bytes of overhead goes -- > are you interleaving the overhead inline with 4k of encrypted content? Yes, it's all inline, the file format is like this [2]: file header ... 18 bytes data block 1 ... 16 bytes block header (IV) 1-4096 bytes user data 16 bytes block trailer (MAC) [more data blocks...] I am attaching the SEEK_DATA/SEEK_HOLE trace below [1]. One seek goes like this: (1) gocryptfs gets a seek() call via FUSE (2) translate plaintext to ciphertext offset (3) call ext4 seek (4) translate back to plaintext offset and return Actually Eryu's first comment had me start thinking about this, and what gocryptfs does is a little stupid, because the offsets that get returned to userspace may point in the middle of a block, but gocryptfs reads or writes in blocks of 4128 bytes (= 4096 bytes of user data) (*except at EOF), so it may as well round up at seek already. This would also mean that the offsets returned to userspace get aligned to 4096 bytes, and generic/286 would just work as is. Another filesystem not aligning to 4096 bytes may hit the underflow, but it won't be gocryptfs :) Thanks, Jakob [1]: gocryptfs seek( 0, SEEK_DATA) -> translate -> ext4 seek( 0, SEEK_DATA) = 0 -> translate -> return 0 gocryptfs seek( 0, SEEK_HOLE) -> translate -> ext4 seek( 0, SEEK_HOLE) = 1060864 -> translate -> return 1052622 gocryptfs seek( 1052622, SEEK_DATA) -> translate -> ext4 seek( 1060864, SEEK_DATA) = 5283840 -> translate -> return 5242862 gocryptfs seek( 5242862, SEEK_HOLE) -> translate -> ext4 seek( 5283840, SEEK_HOLE) = 6344704 -> translate -> return 6295502 gocryptfs seek( 6295502, SEEK_DATA) -> translate -> ext4 seek( 6344704, SEEK_DATA) = 10567680 -> translate -> return 10485742 gocryptfs seek( 10485742, SEEK_HOLE) -> translate -> ext4 seek( 10567680, SEEK_HOLE) = 11628544 -> translate -> return 11538382 gocryptfs seek( 11538382, SEEK_DATA) -> translate -> ext4 seek( 11628544, SEEK_DATA) = 15851520 -> translate -> return 15728622 gocryptfs seek( 15728622, SEEK_HOLE) -> translate -> ext4 seek( 15851520, SEEK_HOLE) = 16912384 -> translate -> return 16781262 gocryptfs seek( 16781262, SEEK_DATA) -> translate -> ext4 seek( 16912384, SEEK_DATA) = 21135360 -> translate -> return 20971502 gocryptfs seek( 20971502, SEEK_HOLE) -> translate -> ext4 seek( 21135360, SEEK_HOLE) = 22196224 -> translate -> return 22024142 gocryptfs seek( 22024142, SEEK_DATA) -> translate -> ext4 seek( 22196224, SEEK_DATA) = 26419200 -> translate -> return 26214382 gocryptfs seek( 26214382, SEEK_HOLE) -> translate -> ext4 seek( 26419200, SEEK_HOLE) = 27480064 -> translate -> return 27267022 gocryptfs seek( 27267022, SEEK_DATA) -> translate -> ext4 seek( 27480064, SEEK_DATA) = 31703040 -> translate -> return 31457262 gocryptfs seek( 31457262, SEEK_HOLE) -> translate -> ext4 seek( 31703040, SEEK_HOLE) = 32763904 -> translate -> return 32509902 gocryptfs seek( 32509902, SEEK_DATA) -> translate -> ext4 seek( 32763904, SEEK_DATA) = 36986880 -> translate -> return 36700142 gocryptfs seek( 36700142, SEEK_HOLE) -> translate -> ext4 seek( 36986880, SEEK_HOLE) = 38047744 -> translate -> return 37752782 gocryptfs seek( 37752782, SEEK_DATA) -> translate -> ext4 seek( 38047744, SEEK_DATA) = 42270720 -> translate -> return 41943022 gocryptfs seek( 41943022, SEEK_HOLE) -> translate -> ext4 seek( 42270720, SEEK_HOLE) = 43331584 -> translate -> return 42995662 gocryptfs seek( 42995662, SEEK_DATA) -> translate -> ext4 seek( 43331584, SEEK_DATA) = 47554560 -> translate -> return 47185902 gocryptfs seek( 47185902, SEEK_HOLE) -> translate -> ext4 seek( 47554560, SEEK_HOLE) = 48615424 -> translate -> return 48238542 gocryptfs seek( 48238542, SEEK_DATA) -> translate -> ext4 seek( 48615424, SEEK_DATA) = 52838400 -> translate -> return 52428782 gocryptfs seek( 52428782, SEEK_HOLE) -> translate -> ext4 seek( 52838400, SEEK_HOLE) = 53899264 -> translate -> return 53481422 gocryptfs seek( 53481422, SEEK_DATA) -> translate -> ext4 seek( 53899264, SEEK_DATA) = 58122240 -> translate -> return 57671662 gocryptfs seek( 57671662, SEEK_HOLE) -> translate -> ext4 seek( 58122240, SEEK_HOLE) = 59183104 -> translate -> return 58724302 gocryptfs seek( 58724302, SEEK_DATA) -> translate -> ext4 seek( 59183104, SEEK_DATA) = 63406080 -> translate -> return 62914542 gocryptfs seek( 62914542, SEEK_HOLE) -> translate -> ext4 seek( 63406080, SEEK_HOLE) = 64466944 -> translate -> return 63967182 gocryptfs seek( 63967182, SEEK_DATA) -> translate -> ext4 seek( 64466944, SEEK_DATA) = 68689920 -> translate -> return 68157422 gocryptfs seek( 68157422, SEEK_HOLE) -> translate -> ext4 seek( 68689920, SEEK_HOLE) = 69750784 -> translate -> return 69210062 gocryptfs seek( 69210062, SEEK_DATA) -> translate -> ext4 seek( 69750784, SEEK_DATA) = 73973760 -> translate -> return 73400302 gocryptfs seek( 73400302, SEEK_HOLE) -> translate -> ext4 seek( 73973760, SEEK_HOLE) = 75034624 -> translate -> return 74452942 gocryptfs seek( 74452942, SEEK_DATA) -> translate -> ext4 seek( 75034624, SEEK_DATA) = 79257600 -> translate -> return 78643182 gocryptfs seek( 78643182, SEEK_HOLE) -> translate -> ext4 seek( 79257600, SEEK_HOLE) = 80318464 -> translate -> return 79695822 gocryptfs seek( 79695822, SEEK_DATA) -> translate -> ext4 seek( 80318464, SEEK_DATA) = 84541440 -> translate -> return 83886062 gocryptfs seek( 83886062, SEEK_HOLE) -> translate -> ext4 seek( 84541440, SEEK_HOLE) = 85602304 -> translate -> return 84938702 gocryptfs seek( 84938702, SEEK_DATA) -> translate -> ext4 seek( 85602304, SEEK_DATA) = 89825280 -> translate -> return 89128942 gocryptfs seek( 89128942, SEEK_HOLE) -> translate -> ext4 seek( 89825280, SEEK_HOLE) = 90886144 -> translate -> return 90181582 gocryptfs seek( 90181582, SEEK_DATA) -> translate -> ext4 seek( 90886144, SEEK_DATA) = 95109120 -> translate -> return 94371822 gocryptfs seek( 94371822, SEEK_HOLE) -> translate -> ext4 seek( 95109120, SEEK_HOLE) = 96169984 -> translate -> return 95424462 gocryptfs seek( 95424462, SEEK_DATA) -> translate -> ext4 seek( 96169984, SEEK_DATA) = 100392960 -> translate -> return 99614702 gocryptfs seek( 99614702, SEEK_HOLE) -> translate -> ext4 seek(100392960, SEEK_HOLE) = 101453824 -> translate -> return 100667342 gocryptfs seek(100667342, SEEK_DATA) -> translate -> ext4 seek(101453824, SEEK_DATA) = 105676800 -> translate -> return 104857582 gocryptfs seek(104857582, SEEK_HOLE) -> translate -> ext4 seek(105676800, SEEK_HOLE) = 106733586 -> translate -> return 105906176 [2]: https://github.com/rfjakob/gocryptfs/blob/master/Documentation/file-format.md
diff --git a/src/seek_copy_test.c b/src/seek_copy_test.c index 0c2c6a3d..28c021e2 100644 --- a/src/seek_copy_test.c +++ b/src/seek_copy_test.c @@ -98,7 +98,7 @@ do_extent_copy(int src_fd, int dest_fd, off_t data_off, off_t hole_off) } while (len > 0) { - ssize_t nr_read = read(src_fd, buf, BUF_SIZE); + ssize_t nr_read = read(src_fd, buf, MIN(len, BUF_SIZE)); if (nr_read < 0) { if (errno == EINTR) continue;
The read loop always requested 4096 bytes, which only works when the total read length is a multiple of 4096 bytes. This is not neccessarily true, and when it's not, len wraps around to UINT64_MAX and you get a lot of these: ERROR: [error:38] reached EOF:Success This was caught when running xfstests against gocryptfs, an encrypted overlay file system. On ext4, the test still passes after this change. Signed-off-by: Jakob Unterwurzacher <jakobunt@gmail.com> --- src/seek_copy_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)