Message ID | 1431145253-2019-3-git-send-email-jaegeuk@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, May 08, 2015 at 09:20:38PM -0700, Jaegeuk Kim wrote:
> This definitions will be used by inode and superblock for encyption.
How much of this crypto stuff is common with or only slightly
modified from the ext4 code? Is the behaviour and features the
same? Is the user API and management tools the same?
IMO, if there is any amount of overlap, then we should be
implementing this stuff as generic code, not propagating the same
code through multiple filesystems via copy-n-paste-n-modify. This
will simply end up with diverging code, different bugs and feature
sets, and none of the implementations will get the review and
maintenance they really require...
And, FWIW, this is the reason why I originally asked for the ext4
encryption code to be pulled up to the VFS: precisely so we didn't
end up with a rapid proliferation of individual in-filesystem
encryption implementations that are all slightly different...
Cheers,
Dave.
On Wed, May 13, 2015 at 12:02:08PM +1000, Dave Chinner wrote: > On Fri, May 08, 2015 at 09:20:38PM -0700, Jaegeuk Kim wrote: > > This definitions will be used by inode and superblock for encyption. > > How much of this crypto stuff is common with or only slightly > modified from the ext4 code? Is the behaviour and features the > same? Is the user API and management tools the same? > > IMO, if there is any amount of overlap, then we should be > implementing this stuff as generic code, not propagating the same > code through multiple filesystems via copy-n-paste-n-modify. This > will simply end up with diverging code, different bugs and feature > sets, and none of the implementations will get the review and > maintenance they really require... > > And, FWIW, this is the reason why I originally asked for the ext4 > encryption code to be pulled up to the VFS: precisely so we didn't > end up with a rapid proliferation of individual in-filesystem > encryption implementations that are all slightly different... Totally agreed! AFAIK, Ted wants to push the codes as a crypto library into fs/ finally, so I believe most part of crypto codes are common. But, in order to realize that quickly, Ted implemented the feature to finalize on-disk and in-memory design in EXT4 as a first step. Then, I've been catching up and validating its design by implementing it in F2FS, which also intends to figure out what crypto codes can be exactly common. As Ted mentioned before, since next android version tries to use per-file encryption, F2FS also needs to support it as quick as possible likewise EXT4. Meanwhile, surely I've been working on writing patches to push them into fs/; currenlty, I did for cryto.c and will do for crypto_key.c and crypto_fname.c. But, it needs to think about crypto_policy.c differently, since it may depend on how each filesystem stores the policy information respectively; we cannot push all the filesystems should use xattrs, right? Anyway, let me take a time to work on this and submit RFC patches. Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 12, 2015 at 11:48:02PM -0700, Jaegeuk Kim wrote: > On Wed, May 13, 2015 at 12:02:08PM +1000, Dave Chinner wrote: > > On Fri, May 08, 2015 at 09:20:38PM -0700, Jaegeuk Kim wrote: > > > This definitions will be used by inode and superblock for encyption. > > > > How much of this crypto stuff is common with or only slightly > > modified from the ext4 code? Is the behaviour and features the > > same? Is the user API and management tools the same? > > > > IMO, if there is any amount of overlap, then we should be > > implementing this stuff as generic code, not propagating the same > > code through multiple filesystems via copy-n-paste-n-modify. This > > will simply end up with diverging code, different bugs and feature > > sets, and none of the implementations will get the review and > > maintenance they really require... > > > > And, FWIW, this is the reason why I originally asked for the ext4 > > encryption code to be pulled up to the VFS: precisely so we didn't > > end up with a rapid proliferation of individual in-filesystem > > encryption implementations that are all slightly different... > > Totally agreed! > > AFAIK, Ted wants to push the codes as a crypto library into fs/ finally, so > I believe most part of crypto codes are common. Can I suggest fs/crypto/ if there are going to be multiple files? > But, in order to realize that quickly, Ted implemented the feature to finalize > on-disk and in-memory design in EXT4 as a first step. > Then, I've been catching up and validating its design by implementing it in > F2FS, which also intends to figure out what crypto codes can be exactly common. Excellent. That will make it easier and less error prone for other filesystems to implement it, too! > As Ted mentioned before, since next android version tries to use per-file > encryption, F2FS also needs to support it as quick as possible likewise EXT4. Fair enough. > Meanwhile, surely I've been working on writing patches to push them into fs/; > currenlty, I did for cryto.c and will do for crypto_key.c and crypto_fname.c. > But, it needs to think about crypto_policy.c differently, since it may depend > on how each filesystem stores the policy information respectively; we cannot > push all the filesystems should use xattrs, right? All filesystems likely to implement per-file crypto support xattrs, and this is exactly what xattrs are designed for. e.g. we already require xattrs for generic security labels, ACLs, etc. Hence per-file crypto information should also use a common, shared xattr format. That way it only needs to be implemented once in the generic code and there's very little (hopefully nothing!) each filesystem has to customise to store the crypto information for each file. Cheers, Dave.
On Thu, May 14, 2015 at 10:37:21AM +1000, Dave Chinner wrote: > On Tue, May 12, 2015 at 11:48:02PM -0700, Jaegeuk Kim wrote: > > On Wed, May 13, 2015 at 12:02:08PM +1000, Dave Chinner wrote: > > > On Fri, May 08, 2015 at 09:20:38PM -0700, Jaegeuk Kim wrote: > > > > This definitions will be used by inode and superblock for encyption. > > > > > > How much of this crypto stuff is common with or only slightly > > > modified from the ext4 code? Is the behaviour and features the > > > same? Is the user API and management tools the same? > > > > > > IMO, if there is any amount of overlap, then we should be > > > implementing this stuff as generic code, not propagating the same > > > code through multiple filesystems via copy-n-paste-n-modify. This > > > will simply end up with diverging code, different bugs and feature > > > sets, and none of the implementations will get the review and > > > maintenance they really require... > > > > > > And, FWIW, this is the reason why I originally asked for the ext4 > > > encryption code to be pulled up to the VFS: precisely so we didn't > > > end up with a rapid proliferation of individual in-filesystem > > > encryption implementations that are all slightly different... > > > > Totally agreed! > > > > AFAIK, Ted wants to push the codes as a crypto library into fs/ finally, so > > I believe most part of crypto codes are common. > > Can I suggest fs/crypto/ if there are going to be multiple files? No problem at all. I'll do. > > > But, in order to realize that quickly, Ted implemented the feature to finalize > > on-disk and in-memory design in EXT4 as a first step. > > Then, I've been catching up and validating its design by implementing it in > > F2FS, which also intends to figure out what crypto codes can be exactly common. > > Excellent. That will make it easier and less error prone for other > filesystems to implement it, too! > > > As Ted mentioned before, since next android version tries to use per-file > > encryption, F2FS also needs to support it as quick as possible likewise EXT4. > > Fair enough. > > > Meanwhile, surely I've been working on writing patches to push them into fs/; > > currenlty, I did for cryto.c and will do for crypto_key.c and crypto_fname.c. > > But, it needs to think about crypto_policy.c differently, since it may depend > > on how each filesystem stores the policy information respectively; we cannot > > push all the filesystems should use xattrs, right? > > All filesystems likely to implement per-file crypto support xattrs, > and this is exactly what xattrs are designed for. e.g. we already > require xattrs for generic security labels, ACLs, etc. Hence > per-file crypto information should also use a common, shared xattr > format. That way it only needs to be implemented once in the generic > code and there's very little (hopefully nothing!) each filesystem > has to customise to store the crypto information for each file. Ok, I see. Let me take a look at that too. Thank you for sharing your thoughts. :) > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Please keep in mind that I'm also working on transparent compression. I'm watching this thread closely so that I can implement a compression library alongside the crypto library. If there is any interest or benefit, I would be glad to work together so that the two can be done cooperatively at the same time. On 05/13/2015 06:56 PM, Jaegeuk Kim wrote: > On Thu, May 14, 2015 at 10:37:21AM +1000, Dave Chinner wrote: >> On Tue, May 12, 2015 at 11:48:02PM -0700, Jaegeuk Kim wrote: >>> On Wed, May 13, 2015 at 12:02:08PM +1000, Dave Chinner wrote: >>>> On Fri, May 08, 2015 at 09:20:38PM -0700, Jaegeuk Kim wrote: >>>>> This definitions will be used by inode and superblock for encyption. >>>> How much of this crypto stuff is common with or only slightly >>>> modified from the ext4 code? Is the behaviour and features the >>>> same? Is the user API and management tools the same? >>>> >>>> IMO, if there is any amount of overlap, then we should be >>>> implementing this stuff as generic code, not propagating the same >>>> code through multiple filesystems via copy-n-paste-n-modify. This >>>> will simply end up with diverging code, different bugs and feature >>>> sets, and none of the implementations will get the review and >>>> maintenance they really require... >>>> >>>> And, FWIW, this is the reason why I originally asked for the ext4 >>>> encryption code to be pulled up to the VFS: precisely so we didn't >>>> end up with a rapid proliferation of individual in-filesystem >>>> encryption implementations that are all slightly different... >>> Totally agreed! >>> >>> AFAIK, Ted wants to push the codes as a crypto library into fs/ finally, so >>> I believe most part of crypto codes are common. >> Can I suggest fs/crypto/ if there are going to be multiple files? > No problem at all. I'll do. > >>> But, in order to realize that quickly, Ted implemented the feature to finalize >>> on-disk and in-memory design in EXT4 as a first step. >>> Then, I've been catching up and validating its design by implementing it in >>> F2FS, which also intends to figure out what crypto codes can be exactly common. >> Excellent. That will make it easier and less error prone for other >> filesystems to implement it, too! >> >>> As Ted mentioned before, since next android version tries to use per-file >>> encryption, F2FS also needs to support it as quick as possible likewise EXT4. >> Fair enough. >> >>> Meanwhile, surely I've been working on writing patches to push them into fs/; >>> currenlty, I did for cryto.c and will do for crypto_key.c and crypto_fname.c. >>> But, it needs to think about crypto_policy.c differently, since it may depend >>> on how each filesystem stores the policy information respectively; we cannot >>> push all the filesystems should use xattrs, right? >> All filesystems likely to implement per-file crypto support xattrs, >> and this is exactly what xattrs are designed for. e.g. we already >> require xattrs for generic security labels, ACLs, etc. Hence >> per-file crypto information should also use a common, shared xattr >> format. That way it only needs to be implemented once in the generic >> code and there's very little (hopefully nothing!) each filesystem >> has to customise to store the crypto information for each file. > Ok, I see. Let me take a look at that too. > Thank you for sharing your thoughts. :) > >> Cheers, >> >> Dave. >> -- >> Dave Chinner >> david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 14, 2015 at 09:50:44AM -0700, Tom Marshall wrote: > Please keep in mind that I'm also working on transparent > compression. I'm watching this thread closely so that I can > implement a compression library alongside the crypto library. If > there is any interest or benefit, I would be glad to work together > so that the two can be done cooperatively at the same time. I can't imagine quickly how compression code can be shared with crypto. The basic approach for compression would be that X pages can be compressed into small number of pages, Y, which can be a X to Y mapping. But, this per-file encryption supports only 1 to 1 4KB mapping, so that it could be quite a simple implementation. Could you elaborate on your approach or design? Or, codes? Whatever, IMO, it needs to implement it by any filesystem first. Thanks, > > On 05/13/2015 06:56 PM, Jaegeuk Kim wrote: > >On Thu, May 14, 2015 at 10:37:21AM +1000, Dave Chinner wrote: > >>On Tue, May 12, 2015 at 11:48:02PM -0700, Jaegeuk Kim wrote: > >>>On Wed, May 13, 2015 at 12:02:08PM +1000, Dave Chinner wrote: > >>>>On Fri, May 08, 2015 at 09:20:38PM -0700, Jaegeuk Kim wrote: > >>>>>This definitions will be used by inode and superblock for encyption. > >>>>How much of this crypto stuff is common with or only slightly > >>>>modified from the ext4 code? Is the behaviour and features the > >>>>same? Is the user API and management tools the same? > >>>> > >>>>IMO, if there is any amount of overlap, then we should be > >>>>implementing this stuff as generic code, not propagating the same > >>>>code through multiple filesystems via copy-n-paste-n-modify. This > >>>>will simply end up with diverging code, different bugs and feature > >>>>sets, and none of the implementations will get the review and > >>>>maintenance they really require... > >>>> > >>>>And, FWIW, this is the reason why I originally asked for the ext4 > >>>>encryption code to be pulled up to the VFS: precisely so we didn't > >>>>end up with a rapid proliferation of individual in-filesystem > >>>>encryption implementations that are all slightly different... > >>>Totally agreed! > >>> > >>>AFAIK, Ted wants to push the codes as a crypto library into fs/ finally, so > >>>I believe most part of crypto codes are common. > >>Can I suggest fs/crypto/ if there are going to be multiple files? > >No problem at all. I'll do. > > > >>>But, in order to realize that quickly, Ted implemented the feature to finalize > >>>on-disk and in-memory design in EXT4 as a first step. > >>>Then, I've been catching up and validating its design by implementing it in > >>>F2FS, which also intends to figure out what crypto codes can be exactly common. > >>Excellent. That will make it easier and less error prone for other > >>filesystems to implement it, too! > >> > >>>As Ted mentioned before, since next android version tries to use per-file > >>>encryption, F2FS also needs to support it as quick as possible likewise EXT4. > >>Fair enough. > >> > >>>Meanwhile, surely I've been working on writing patches to push them into fs/; > >>>currenlty, I did for cryto.c and will do for crypto_key.c and crypto_fname.c. > >>>But, it needs to think about crypto_policy.c differently, since it may depend > >>>on how each filesystem stores the policy information respectively; we cannot > >>>push all the filesystems should use xattrs, right? > >>All filesystems likely to implement per-file crypto support xattrs, > >>and this is exactly what xattrs are designed for. e.g. we already > >>require xattrs for generic security labels, ACLs, etc. Hence > >>per-file crypto information should also use a common, shared xattr > >>format. That way it only needs to be implemented once in the generic > >>code and there's very little (hopefully nothing!) each filesystem > >>has to customise to store the crypto information for each file. > >Ok, I see. Let me take a look at that too. > >Thank you for sharing your thoughts. :) > > > >>Cheers, > >> > >>Dave. > >>-- > >>Dave Chinner > >>david@fromorbit.com > >-- > >To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > >the body of a message to majordomo@vger.kernel.org > >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 15, 2015 at 06:14:24PM -0700, Jaegeuk Kim wrote: > On Thu, May 14, 2015 at 09:50:44AM -0700, Tom Marshall wrote: > > Please keep in mind that I'm also working on transparent > > compression. I'm watching this thread closely so that I can > > implement a compression library alongside the crypto library. If > > there is any interest or benefit, I would be glad to work together > > so that the two can be done cooperatively at the same time. > > I can't imagine quickly how compression code can be shared with crypto. > The basic approach for compression would be that X pages can be compressed into > small number of pages, Y, which can be a X to Y mapping. > But, this per-file encryption supports only 1 to 1 4KB mapping, so that it could > be quite a simple implementation. No, I don't intend to share actual code with crypto -- at least not much. I'm more interested in looking at how the crypto layer is implemented to give me clues about how to implement a compression layer. > Could you elaborate on your approach or design? Or, codes? > Whatever, IMO, it needs to implement it by any filesystem first. I don't really have any working code yet. I will probably get to that in the coming few weeks. Right now I'm still working with the ugly VFS stacking implementation that I posted initially. The thing that I have done is dismissed the standard compression framing formats. zlib (and gzip) are designed for streaming and it is quite difficult to implement random access on it. See the example code in the zlib source, zran.c. It's not really tenable because 32kb of prior data is required to be kept as priming information. Even doing fully encapsulated blocks with Z_FINISH, there is still no way to skip over data without decompressing it first to build an index. lz4 is somewhat better in that blocks are self contained. But block lengths must be read sequentially. This means that reading an arbitrary position in a file requires a proportional number of reads to find the desired block. So, I am working with a simple framing format that I threw together. The header has a compression method (zlib or lz4), block size, original input size, and a block map. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 14, 2015 at 10:37:21AM +1000, Dave Chinner wrote: > > > > AFAIK, Ted wants to push the codes as a crypto library into fs/ finally, so > > I believe most part of crypto codes are common. > > Can I suggest fs/crypto/ if there are going to be multiple files? Yes, I think we definitely want to do this as fs/crypto; it makes things a bit more straightforward on a number of fronts; not just from a Makefile point of view, but also in setting up a separate tree to push into linux-next. > > But, in order to realize that quickly, Ted implemented the feature to finalize > > on-disk and in-memory design in EXT4 as a first step. > > Then, I've been catching up and validating its design by implementing it in > > F2FS, which also intends to figure out what crypto codes can be exactly common. > > Excellent. That will make it easier and less error prone for other > filesystems to implement it, too! Yeah, that's why I figured it would be easier to get it into ext4 and f2fs first. We can keep the functions in sync for a release or so, and then we can start creating patches that gradually move each of fs/{ext4,f2fs}/{crypto,crypo_fname,crypto_policy}.c, etc. into a common fs/crypto directory. I'm sure there will be a bit of refactoring that will be necessary as we go along, but it will be a lot easier to validate that we've gotten things once we have to file systems both depending on the common code. If Michael and I had tried to create fs/cryto up-front, I'm *sure* we would have gotten a number of things wrong. :-) > > Meanwhile, surely I've been working on writing patches to push them into fs/; > > currenlty, I did for cryto.c and will do for crypto_key.c and crypto_fname.c. > > But, it needs to think about crypto_policy.c differently, since it may depend > > on how each filesystem stores the policy information respectively; we cannot > > push all the filesystems should use xattrs, right? > > All filesystems likely to implement per-file crypto support xattrs, > and this is exactly what xattrs are designed for. e.g. we already > require xattrs for generic security labels, ACLs, etc. Hence > per-file crypto information should also use a common, shared xattr > format. That way it only needs to be implemented once in the generic > code and there's very little (hopefully nothing!) each filesystem > has to customise to store the crypto information for each file. Yes, absolutely. What's going to be more difficult in the long run is when we want to support per-block data integrity, using (for example) AES/GCM, which will require 32 bytes of per-block "authentication tag" (think Message Authentication Code) that have to be committed atomically with the data block write. Quite frankly, this is going to be much easier for COW file systems like f2fs and btrfs. I have some ideas about how to do this for update-in-place file systems (using either a compression hack and/or storing two generations worth of authentication tag per block), but it's not pretty. (Or in the case of ext4 we could use data=journalling mode, but that's even less pretty from a performance standpoint.) Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, May 16, 2015 at 09:24:03AM -0400, Theodore Ts'o wrote: > [...] What's going to be more difficult in the long run is > when we want to support per-block data integrity, using (for example) > AES/GCM, which will require 32 bytes of per-block "authentication tag" > (think Message Authentication Code) that have to be committed > atomically with the data block write. > > Quite frankly, this is going to be much easier for COW file systems > like f2fs and btrfs. I have some ideas about how to do this for > update-in-place file systems (using either a compression hack and/or > storing two generations worth of authentication tag per block), but > it's not pretty. (Or in the case of ext4 we could use > data=journalling mode, but that's even less pretty from a performance > standpoint.) It seems to me that compression has a similar issue with framing metadata. The block map can be stored upfront, which then would require two writes to update a data block. Or it can be stored inline, which then would require scanning through the file sequentially for random access. The big difference, of course, is that in the case of compression, we have the luxury of assuming or mandating read-only/read-mostly. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 15, 2015 at 09:47:05PM -0700, Tom Marshall wrote: > On Fri, May 15, 2015 at 06:14:24PM -0700, Jaegeuk Kim wrote: > > On Thu, May 14, 2015 at 09:50:44AM -0700, Tom Marshall wrote: > > > Please keep in mind that I'm also working on transparent > > > compression. I'm watching this thread closely so that I can > > > implement a compression library alongside the crypto library. If > > > there is any interest or benefit, I would be glad to work together > > > so that the two can be done cooperatively at the same time. > > > > I can't imagine quickly how compression code can be shared with crypto. > > The basic approach for compression would be that X pages can be compressed into > > small number of pages, Y, which can be a X to Y mapping. > > But, this per-file encryption supports only 1 to 1 4KB mapping, so that it could > > be quite a simple implementation. > > No, I don't intend to share actual code with crypto -- at least not much. > I'm more interested in looking at how the crypto layer is implemented to > give me clues about how to implement a compression layer. Ok, I see. Currently, I've been writing up fs/crypto having shared crypto codes between ext4 and f2fs; I refactored existing codes a little bit though. I'm approaching to introduce a fscypt_operations for each filesystems like this. struct fscrypt_operations { int (*get_context)(struct inode *, void *, size_t, void *); int (*set_context)(struct inode *, const void *, size_t, int, void *); int (*prepare_new_context)(struct inode *); bool (*is_encrypted)(struct inode *); bool (*empty_dir)(struct inode *); unsigned (*max_namelen)(struct inode *); }; And, the following two basic functions will be used by filesystems: fscrypt_encrypt_page() and fscrypt_decrypt_page(). > > > Could you elaborate on your approach or design? Or, codes? > > Whatever, IMO, it needs to implement it by any filesystem first. > > I don't really have any working code yet. I will probably get to that in > the coming few weeks. Right now I'm still working with the ugly VFS > stacking implementation that I posted initially. > > The thing that I have done is dismissed the standard compression framing > formats. > > zlib (and gzip) are designed for streaming and it is quite difficult to > implement random access on it. See the example code in the zlib source, > zran.c. It's not really tenable because 32kb of prior data is required to > be kept as priming information. Even doing fully encapsulated blocks with > Z_FINISH, there is still no way to skip over data without decompressing it > first to build an index. > > lz4 is somewhat better in that blocks are self contained. But block lengths > must be read sequentially. This means that reading an arbitrary position in > a file requires a proportional number of reads to find the desired block. > > So, I am working with a simple framing format that I threw together. The > header has a compression method (zlib or lz4), block size, original input > size, and a block map. Thank you for sharing the approach, and it makes sense to improve random read performance. Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
So I've been playing around a bit and I have a basic strategy laid out. Please let me know if I'm on the right track. Compressed file attributes ========================== The filesystem is responsible for detecting whether a file is compressed and hooking into the compression lib. This may be done with an inode flag, xattr, or any other applicable method. No other special attributes are necessary. Compressed file format ====================== Compressed files shall have header, block map, and data sections. Header: byte[4] magic 'zzzz' (not strictly needed) byte param1 method and flags bits 0..3 = compression method (1=zlib, 2=lz4, etc.) bits 4..7 = flags (none defined yet) byte blocksize log2 of blocksize (max 31) le48 orig_size original uncompressed file size Block map: Vector of le16 (if blocksize <= 16) or le32 (if blocksize > 16). Each entry is the compressed size of the block. Zero indicates that the block is stored uncompressed, in case compression expanded the block. Data: Compressed data. Compression library =================== I'm just fleshing this out and trying to make it work. The basic strategy is to wrap the underlying readpage/readpages. Right now I have the following: /* * For convenience. */ typedef int (*readpage_t)(struct page *); /* * Indicate whether compression is enabled. It may be desirable to disable * compression for test, backup, or maintenance activities. Controlled by * a sysfs file. */ extern int zfile_enabled(void); /* * Initialize internal data structures for a given inode. This will * result in reading the file header and block map, so the inode must * be fully populated and ready to accept readpage requests. */ extern int zinode_init(struct inode *inode, readpage_t lower_readpage); /* * Wrapper for filesystem's readpage. Consults the block map and reads * the appropriate compressed pages for the requested block, decompresses * them, and populates the page with uncompressed data. */ extern int zinode_readpage(readpage_t lower_readpage, struct page *page); /* * As above, but for multiple pages. */ extern int zinode_readpages(readpage_t lower_readpage, struct address_space *mapping, struct list_head *pages, unsigned nr_pages); Questions and issues ==================== Should there be any padding for the data blocks? For example, if writing is to be supported, padding the compressed data to the filesystem block size would allow for easy rewriting of individual blocks without disturbing the surrounding blocks. Perhaps padding could be indicated by a flag. Note readpage_t does not take a file ptr, as it's not available at inode creation time. ext4 only uses it to lookup the inode ptr. Is this true across all filesystems, or do some actually require the file ptr in their readpage? The compression code must be able to read pages from the underlying filesystem. This involves using the pagecache. But the uncompressed data is what ultimately should end up in the pagecache. This is where I'm currently stuck. How do I implement the code such that the underlying compressed data may be read (using the pagecache or not) while not disturbing the pagecache for the uncompressed data? I'm wondering if I need to create an internal address_space to pass down into the underlying readpage? Or is there another way to do this? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> The compression code must be able to read pages from the underlying > filesystem. This involves using the pagecache. But the uncompressed data > is what ultimately should end up in the pagecache. This is where I'm > currently stuck. How do I implement the code such that the underlying > compressed data may be read (using the pagecache or not) while not > disturbing the pagecache for the uncompressed data? I'm wondering if I need > to create an internal address_space to pass down into the underlying > readpage? Or is there another way to do this? I created a private address_space for the inode that is passed into the lower readpage. That seems to be working. Is this a cool thing to do (hopefully)? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 20, 2015 at 10:46:35AM -0700, Tom Marshall wrote: > So I've been playing around a bit and I have a basic strategy laid out. > Please let me know if I'm on the right track. > > Compressed file attributes > ========================== > > The filesystem is responsible for detecting whether a file is compressed and > hooking into the compression lib. This may be done with an inode flag, > xattr, or any other applicable method. No other special attributes are > necessary. So I assume what you are implementing is read-only compression; that is, once the file is written, and the attribute set indicating that this is a compressed file, it is now immutable. > Compressed file format > ====================== > > Compressed files shall have header, block map, and data sections. > > Header: > > byte[4] magic 'zzzz' (not strictly needed) > byte param1 method and flags > bits 0..3 = compression method (1=zlib, 2=lz4, etc.) > bits 4..7 = flags (none defined yet) > byte blocksize log2 of blocksize (max 31) I suggest using the term "compression cluster" to distinguish this from the file system block size. > le48 orig_size original uncompressed file size > > > Block map: > > Vector of le16 (if blocksize <= 16) or le32 (if blocksize > 16). Each entry > is the compressed size of the block. Zero indicates that the block is > stored uncompressed, in case compression expanded the block. What I would store instead is list of 32 or 64-bit offsets, where the nth entry in the array indicates the starting offset of the nth compression cluster. > Questions and issues ==================== > > Should there be any padding for the data blocks? For example, if writing is > to be supported, padding the compressed data to the filesystem block size > would allow for easy rewriting of individual blocks without disturbing the > surrounding blocks. Perhaps padding could be indicated by a flag. If you add padding then you defeat the whole point of adding compression. What if the initial contents of a 64k cluster was all zeros, so it trivially compresses down to a few dozen bytes; but then it gets replaced by completely uncompressible data? If you add 64k worth of padding to each block, then you're not saving any space, so what's the point? > The compression code must be able to read pages from the underlying > filesystem. This involves using the pagecache. But the uncompressed data > is what ultimately should end up in the pagecache. This is where I'm > currently stuck. How do I implement the code such that the underlying > compressed data may be read (using the pagecache or not) while not > disturbing the pagecache for the uncompressed data? I'm wondering if I need > to create an internal address_space to pass down into the underlying > readpage? Or is there another way to do this? So I would *not* reference the compressed data via the page cache. If you do that, then you end up wasting space in the page cache, since the page cache will contain both the compressed and decompressed data --- and once the data has been decompressed, the compressed version is completely useless. So it's better to have the file system supply the physical location on disk, and then to read in the compressed data to a scratched set of page which is freed immediately after you are done decompressing things. This is why compression is so very different from encryption. The constraints make it quite different. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
So I have this all working as described. I haven't implemented readahead yet (readpages) so it's slow. I'll be doing that next. The other thing to note is that since the uncompressed size is stored inside the file data, stat() requires reading both the inode and the first page of the file. That's not optimal, but I don't know if other generic out-of-band solutions (such as xattrs) would be any better. I suppose it depends on whether the xattr info is read in with the inode or not. Also, on the subject of file size, I'm currently swapping out the i_size for the compressed i_size before calling down into the filesystem's readpage. Yes that's a nasty hack that I'll need to address. On Wed, May 20, 2015 at 05:36:41PM -0400, Theodore Ts'o wrote: > On Wed, May 20, 2015 at 10:46:35AM -0700, Tom Marshall wrote: > > So I've been playing around a bit and I have a basic strategy laid out. > > Please let me know if I'm on the right track. > > > > Compressed file attributes > > ========================== > > > > The filesystem is responsible for detecting whether a file is compressed and > > hooking into the compression lib. This may be done with an inode flag, > > xattr, or any other applicable method. No other special attributes are > > necessary. > > So I assume what you are implementing is read-only compression; that > is, once the file is written, and the attribute set indicating that > this is a compressed file, it is now immutable. That is TBD. Our use case right now only requires read-only, but I think read-write would be a nice thing if it's not too convoluted. fallocate() is supported on the major filesystems, and I imagine the same mechanisms could be used to provide rewriting of the "compression clusters". > > Compressed file format > > ====================== > > > > Compressed files shall have header, block map, and data sections. > > > > Header: > > > > byte[4] magic 'zzzz' (not strictly needed) > > byte param1 method and flags > > bits 0..3 = compression method (1=zlib, 2=lz4, etc.) > > bits 4..7 = flags (none defined yet) > > byte blocksize log2 of blocksize (max 31) > > I suggest using the term "compression cluster" to distinguish this > from the file system block size. Sure, just a name... > > le48 orig_size original uncompressed file size > > > > > > Block map: > > > > Vector of le16 (if blocksize <= 16) or le32 (if blocksize > 16). Each entry > > is the compressed size of the block. Zero indicates that the block is > > stored uncompressed, in case compression expanded the block. > > What I would store instead is list of 32 or 64-bit offsets, where the > nth entry in the array indicates the starting offset of the nth > compression cluster. Why? This would both increase the space requirements and require some other mechanism to indicate uncompressed compression clusters (eg. setting the high bit or something). > > Questions and issues > ==================== > > > > Should there be any padding for the data blocks? For example, if writing is > > to be supported, padding the compressed data to the filesystem block size > > would allow for easy rewriting of individual blocks without disturbing the > > surrounding blocks. Perhaps padding could be indicated by a flag. > > If you add padding then you defeat the whole point of adding > compression. What if the initial contents of a 64k cluster was all > zeros, so it trivially compresses down to a few dozen bytes; but then > it gets replaced by completely uncompressible data? If you add 64k > worth of padding to each block, then you're not saving any space, so > what's the point? Sorry, I meant padding to the filesystem block size. So, for example, if a 64kb compression cluster is compressed to 31kb, it would use 8*4kb blocks and the next compression cluster would start on a new block. > > The compression code must be able to read pages from the underlying > > filesystem. This involves using the pagecache. But the uncompressed data > > is what ultimately should end up in the pagecache. This is where I'm > > currently stuck. How do I implement the code such that the underlying > > compressed data may be read (using the pagecache or not) while not > > disturbing the pagecache for the uncompressed data? I'm wondering if I need > > to create an internal address_space to pass down into the underlying > > readpage? Or is there another way to do this? > > So I would *not* reference the compressed data via the page cache. If > you do that, then you end up wasting space in the page cache, since > the page cache will contain both the compressed and decompressed data > --- and once the data has been decompressed, the compressed version is > completely useless. So it's better to have the file system supply the > physical location on disk, and then to read in the compressed data to > a scratched set of page which is freed immediately after you are done > decompressing things. I'm currently using an internal address_space to pass down into the underlying filesystem to make things easy. I'm too inexperienced in filesystem development to unravel how to plumb in anything else (but I'm learning quickly!) If you have ideas about how to do the underlying readpage without the pagecache, please enlighten me. Or perhaps I could manually release the page from the private cache after the uncompressed data has been extracted? > > This is why compression is so very different from encryption. The > constraints make it quite different. > > Regards, > > - Ted > > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 20, 2015 at 03:46:30PM -0700, Tom Marshall wrote: > So I have this all working as described. I haven't implemented readahead > yet (readpages) so it's slow. I'll be doing that next. > > The other thing to note is that since the uncompressed size is stored inside > the file data, stat() requires reading both the inode and the first page of > the file. That's not optimal, but I don't know if other generic out-of-band > solutions (such as xattrs) would be any better. I suppose it depends on > whether the xattr info is read in with the inode or not. > > Also, on the subject of file size, I'm currently swapping out the i_size for > the compressed i_size before calling down into the filesystem's readpage. > Yes that's a nasty hack that I'll need to address. > > On Wed, May 20, 2015 at 05:36:41PM -0400, Theodore Ts'o wrote: > > On Wed, May 20, 2015 at 10:46:35AM -0700, Tom Marshall wrote: > > > So I've been playing around a bit and I have a basic strategy laid out. > > > Please let me know if I'm on the right track. > > > > > > Compressed file attributes > > > ========================== > > > > > > The filesystem is responsible for detecting whether a file is compressed and > > > hooking into the compression lib. This may be done with an inode flag, > > > xattr, or any other applicable method. No other special attributes are > > > necessary. > > > > So I assume what you are implementing is read-only compression; that > > is, once the file is written, and the attribute set indicating that > > this is a compressed file, it is now immutable. > > That is TBD. Our use case right now only requires read-only, but I think > read-write would be a nice thing if it's not too convoluted. > > fallocate() is supported on the major filesystems, and I imagine the same > mechanisms could be used to provide rewriting of the "compression clusters". It just occurred to me that this could be the answer to both a consistent i_size and providing read-write access. If compression clusters were implemented by holes in the underlying file and the header+blockmap were not stored in the file proper, then the compressed and uncompressed file sizes would match. Further, rewriting a compression cluster would be a simple matter of adjusting the allocated compression cluster blocks as fallocate() provides. I honestly don't have a clue how to storing the header+blockmap outside the file yet, but if we could, that would seem to tie everything together quite nicely. Thoughts? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
As I mentioned earlier, I had a hack working for transparent file compression. But it has a few problems. The two major ones are (1) It reads the lower pages synchronously by waiting on pages with wait_on_page_locked(), destroying readahead, and (2) it is swapping out i_size to call the lower readpage. So I'm ditching that and going another route. My idea now is to create a wrapper inode that is passed back to the VFS layer. The wrapper inode would have the correct i_size, blkbits, and implement address_space_operations. The implementation of readpage/readpages would then call down into the lower filesystem's readpage/readpages to fetch compressed data. When the required pages are available, it would decompress and populate its own requested pages. But one thing I'm wrestling with is how to be asynchronously notified when the lower readpage/readpages complete. The two ideas that come to mind are (1) plumbing a callback into mpage_end_io(), (2) allowing override of mpage_end_io() with a custom function, (3) creating kernel threads analogous to kblockd to wait for pending pages. But again, I'm new at this, so maybe there's an easier or better way? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 27, 2015 at 11:53:17AM -0700, Tom Marshall wrote: > But one thing I'm wrestling with is how to be asynchronously notified when > the lower readpage/readpages complete. The two ideas that come to mind are > (1) plumbing a callback into mpage_end_io(), (2) allowing override of > mpage_end_io() with a custom function, (3) creating kernel threads analogous > to kblockd to wait for pending pages. Not all file systems use mpage_end_io(), so that's not a good solution. You can do something like wait_on_page_bit(page, PG_uptodate); ... although to be robust you will also need to wake up if PG_error is set (if there is an I/O error, PG_error is set instead of PG_uptodate). So that means you'd have to spin your own wait function using the waitqueue primitives and page_waitqueue(), using __wait_on_bit() as an initial model. This suggestion should not be taken as an endorsement of your higher-levle architecture. I suggest you think very carefully about whether or not you need to be able to support random write functionality, and if you don't, there are simpler ways such as the one I outlined to you earlier. Regards, and good luck, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 27, 2015 at 07:38:00PM -0400, Theodore Ts'o wrote: > On Wed, May 27, 2015 at 11:53:17AM -0700, Tom Marshall wrote: > > But one thing I'm wrestling with is how to be asynchronously notified when > > the lower readpage/readpages complete. The two ideas that come to mind are > > (1) plumbing a callback into mpage_end_io(), (2) allowing override of > > mpage_end_io() with a custom function, (3) creating kernel threads analogous > > to kblockd to wait for pending pages. > > Not all file systems use mpage_end_io(), so that's not a good > solution. Ah, thanks, I was not aware of that. So that leaves waiting on pages, which probably means a fair amount of plumbing to do correctly. > You can do something like > > wait_on_page_bit(page, PG_uptodate); > > ... although to be robust you will also need to wake up if PG_error is > set (if there is an I/O error, PG_error is set instead of > PG_uptodate). So that means you'd have to spin your own wait function > using the waitqueue primitives and page_waitqueue(), using > __wait_on_bit() as an initial model. Right, that should be pretty easy. > This suggestion should not be taken as an endorsement of your > higher-levle architecture. I suggest you think very carefully about > whether or not you need to be able to support random write > functionality, and if you don't, there are simpler ways such as the > one I outlined to you earlier. I recall this: > [...] So it's better to have the file system supply the physical location on > disk, and then to read in the compressed data to a scratched set of page > which is freed immediately after you are done decompressing things. Is that what you're referring to? If so, I'm not seeing how this makes things simpler. It's still asynchronous, right? ext4_readpage calls back into mpage_readpage which uses ext4_get_block, which then queues a bio request. I don't see any way to avoid queueing asynchronous bio requests or even getting completion notifications. Now, I could pass ext4_get_block up into my code and setup my own bio requests so that I can get callbacks. But this basically means implementing the equivalent of do_mpage_readpage in my own code, and that's not really trivial code to copy/paste/hack. And it also doesn't address filesystems that don't use mpage_end_io(), right? Am I missing something? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 27, 2015 at 07:38:00PM -0400, Theodore Ts'o wrote: > On Wed, May 27, 2015 at 11:53:17AM -0700, Tom Marshall wrote: > > But one thing I'm wrestling with is how to be asynchronously notified when > > the lower readpage/readpages complete. The two ideas that come to mind are > > (1) plumbing a callback into mpage_end_io(), (2) allowing override of > > mpage_end_io() with a custom function, (3) creating kernel threads analogous > > to kblockd to wait for pending pages. > > Not all file systems use mpage_end_io(), so that's not a good > solution. > > You can do something like > > wait_on_page_bit(page, PG_uptodate); > > ... although to be robust you will also need to wake up if PG_error is > set (if there is an I/O error, PG_error is set instead of > PG_uptodate). So that means you'd have to spin your own wait function > using the waitqueue primitives and page_waitqueue(), using > __wait_on_bit() as an initial model. My current thought is to use a workqueue and queue a work object for each cluster read that is in flight. The work function does wait_on_page_locked for each lower page. If/when all complete without error, the data is then decompressed and the upper pages are entered. This is currently a work in progress, it's not yet functional. So basically my questions here are: (1) Does this seem sane? (2) Is it ok to wait on !locked instead of waiting on (uptodate|error)? (3) Do I need to mark a pending cluster pending to prevent simultaneous reads of the same cluster (especially since the cluster size will be much larger than the lower block size)? Also note I've now realized that the logical conclusion to the wrapper inode idea is a stacked filesystem. That's not the direction we are aiming for, so instead I'm adding a "struct xcomp_inode_info" to ext4_inode_info, which can then be passed back into the xcomp functions. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
So I've just gotten this all working. The last notable change I made was to inode size: I added an i_compressed_size member and then did some macro hackage to ensure that the fs implementation (eg. fs/ext4) sees the compressed size while everything else sees the uncompressed size. I'll be testing further tomorrow. On Thu, May 28, 2015 at 01:55:15PM -0700, Tom Marshall wrote: > On Wed, May 27, 2015 at 07:38:00PM -0400, Theodore Ts'o wrote: > > On Wed, May 27, 2015 at 11:53:17AM -0700, Tom Marshall wrote: > > > But one thing I'm wrestling with is how to be asynchronously notified when > > > the lower readpage/readpages complete. The two ideas that come to mind are > > > (1) plumbing a callback into mpage_end_io(), (2) allowing override of > > > mpage_end_io() with a custom function, (3) creating kernel threads analogous > > > to kblockd to wait for pending pages. > > > > Not all file systems use mpage_end_io(), so that's not a good > > solution. > > > > You can do something like > > > > wait_on_page_bit(page, PG_uptodate); > > > > ... although to be robust you will also need to wake up if PG_error is > > set (if there is an I/O error, PG_error is set instead of > > PG_uptodate). So that means you'd have to spin your own wait function > > using the waitqueue primitives and page_waitqueue(), using > > __wait_on_bit() as an initial model. > > My current thought is to use a workqueue and queue a work object for each > cluster read that is in flight. The work function does wait_on_page_locked > for each lower page. If/when all complete without error, the data is then > decompressed and the upper pages are entered. This is currently a work in > progress, it's not yet functional. > > So basically my questions here are: > > (1) Does this seem sane? > > (2) Is it ok to wait on !locked instead of waiting on (uptodate|error)? > > (3) Do I need to mark a pending cluster pending to prevent simultaneous > reads of the same cluster (especially since the cluster size will be > much larger than the lower block size)? > > Also note I've now realized that the logical conclusion to the wrapper inode > idea is a stacked filesystem. That's not the direction we are aiming for, > so instead I'm adding a "struct xcomp_inode_info" to ext4_inode_info, which > can then be passed back into the xcomp functions. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 477e65f..c3c4deb 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -70,6 +70,8 @@ struct f2fs_mount_info { unsigned int opt; }; +#define F2FS_FEATURE_ENCRYPT 0x0001 + #define F2FS_HAS_FEATURE(sb, mask) \ ((F2FS_SB(sb)->raw_super->feature & cpu_to_le32(mask)) != 0) #define F2FS_SET_FEATURE(sb, mask) \ @@ -346,6 +348,7 @@ struct f2fs_map_blocks { */ #define FADVISE_COLD_BIT 0x01 #define FADVISE_LOST_PINO_BIT 0x02 +#define FADVISE_ENCRYPT_BIT 0x04 #define file_is_cold(inode) is_file(inode, FADVISE_COLD_BIT) #define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT) @@ -353,6 +356,16 @@ struct f2fs_map_blocks { #define file_lost_pino(inode) set_file(inode, FADVISE_LOST_PINO_BIT) #define file_clear_cold(inode) clear_file(inode, FADVISE_COLD_BIT) #define file_got_pino(inode) clear_file(inode, FADVISE_LOST_PINO_BIT) +#define file_is_encrypt(inode) is_file(inode, FADVISE_ENCRYPT_BIT) +#define file_set_encrypt(inode) set_file(inode, FADVISE_ENCRYPT_BIT) +#define file_clear_encrypt(inode) clear_file(inode, FADVISE_ENCRYPT_BIT) + +/* Encryption algorithms */ +#define F2FS_ENCRYPTION_MODE_INVALID 0 +#define F2FS_ENCRYPTION_MODE_AES_256_XTS 1 +#define F2FS_ENCRYPTION_MODE_AES_256_GCM 2 +#define F2FS_ENCRYPTION_MODE_AES_256_CBC 3 +#define F2FS_ENCRYPTION_MODE_AES_256_CTS 4 #define DEF_DIR_LEVEL 0 @@ -380,6 +393,11 @@ struct f2fs_inode_info { struct radix_tree_root inmem_root; /* radix tree for inmem pages */ struct list_head inmem_pages; /* inmemory pages managed by f2fs */ struct mutex inmem_lock; /* lock for inmemory pages */ + +#ifdef CONFIG_F2FS_FS_ENCRYPTION + /* Encryption params */ + struct f2fs_crypt_info *i_crypt_info; +#endif }; static inline void get_extent_info(struct extent_info *ext, @@ -1891,4 +1909,40 @@ void f2fs_delete_inline_entry(struct f2fs_dir_entry *, struct page *, struct inode *, struct inode *); bool f2fs_empty_inline_dir(struct inode *); int f2fs_read_inline_dir(struct file *, struct dir_context *); + +/* + * crypto support + */ +static inline int f2fs_encrypted_inode(struct inode *inode) +{ +#ifdef CONFIG_F2FS_FS_ENCRYPTION + return file_is_encrypt(inode); +#else + return 0; +#endif +} + +static inline void f2fs_set_encrypted_inode(struct inode *inode) +{ +#ifdef CONFIG_F2FS_FS_ENCRYPTION + file_set_encrypt(inode); +#endif +} + +static inline bool f2fs_bio_encrypted(struct bio *bio) +{ +#ifdef CONFIG_F2FS_FS_ENCRYPTION + return unlikely(bio->bi_private != NULL); +#else + return false; +#endif +} + +static inline int f2fs_sb_has_crypto(struct super_block *sb) +{ +#ifdef CONFIG_F2FS_FS_ENCRYPTION + return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_ENCRYPT); +#else + return 0; +#endif #endif diff --git a/fs/f2fs/f2fs_crypto.h b/fs/f2fs/f2fs_crypto.h new file mode 100644 index 0000000..cfc37c1 --- /dev/null +++ b/fs/f2fs/f2fs_crypto.h @@ -0,0 +1,149 @@ +/* + * linux/fs/f2fs/f2fs_crypto.h + * + * Copied from linux/fs/ext4/ext4_crypto.h + * + * Copyright (C) 2015, Google, Inc. + * + * This contains encryption header content for f2fs + * + * Written by Michael Halcrow, 2015. + * Modified by Jaegeuk Kim, 2015. + */ +#ifndef _F2FS_CRYPTO_H +#define _F2FS_CRYPTO_H + +#include <linux/fs.h> + +#define F2FS_KEY_DESCRIPTOR_SIZE 8 + +/* Policy provided via an ioctl on the topmost directory */ +struct f2fs_encryption_policy { + char version; + char contents_encryption_mode; + char filenames_encryption_mode; + char flags; + char master_key_descriptor[F2FS_KEY_DESCRIPTOR_SIZE]; +} __attribute__((__packed__)); + +#define F2FS_ENCRYPTION_CONTEXT_FORMAT_V1 1 +#define F2FS_KEY_DERIVATION_NONCE_SIZE 16 + +#define F2FS_POLICY_FLAGS_PAD_4 0x00 +#define F2FS_POLICY_FLAGS_PAD_8 0x01 +#define F2FS_POLICY_FLAGS_PAD_16 0x02 +#define F2FS_POLICY_FLAGS_PAD_32 0x03 +#define F2FS_POLICY_FLAGS_PAD_MASK 0x03 +#define F2FS_POLICY_FLAGS_VALID 0x03 + +/** + * Encryption context for inode + * + * Protector format: + * 1 byte: Protector format (1 = this version) + * 1 byte: File contents encryption mode + * 1 byte: File names encryption mode + * 1 byte: Flags + * 8 bytes: Master Key descriptor + * 16 bytes: Encryption Key derivation nonce + */ +struct f2fs_encryption_context { + char format; + char contents_encryption_mode; + char filenames_encryption_mode; + char flags; + char master_key_descriptor[F2FS_KEY_DESCRIPTOR_SIZE]; + char nonce[F2FS_KEY_DERIVATION_NONCE_SIZE]; +} __attribute__((__packed__)); + +/* Encryption parameters */ +#define F2FS_XTS_TWEAK_SIZE 16 +#define F2FS_AES_128_ECB_KEY_SIZE 16 +#define F2FS_AES_256_GCM_KEY_SIZE 32 +#define F2FS_AES_256_CBC_KEY_SIZE 32 +#define F2FS_AES_256_CTS_KEY_SIZE 32 +#define F2FS_AES_256_XTS_KEY_SIZE 64 +#define F2FS_MAX_KEY_SIZE 64 + +struct f2fs_encryption_key { + __u32 mode; + char raw[F2FS_MAX_KEY_SIZE]; + __u32 size; +} __attribute__((__packed__)); + +struct f2fs_crypt_info { + unsigned char ci_mode; + unsigned char ci_size; + char ci_data_mode; + char ci_filename_mode; + char ci_flags; + struct crypto_ablkcipher *ci_ctfm; + struct key *ci_keyring_key; + char ci_raw[F2FS_MAX_KEY_SIZE]; + char ci_master_key[F2FS_KEY_DESCRIPTOR_SIZE]; +}; + +#define F2FS_CTX_REQUIRES_FREE_ENCRYPT_FL 0x00000001 +#define F2FS_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL 0x00000002 + +struct f2fs_crypto_ctx { + struct crypto_tfm *tfm; /* Crypto API context */ + struct page *bounce_page; /* Ciphertext page on write path */ + struct page *control_page; /* Original page on write path */ + struct bio *bio; /* The bio for this context */ + struct work_struct work; /* Work queue for read complete path */ + struct list_head free_list; /* Free list */ + int flags; /* Flags */ + int mode; /* Encryption mode for tfm */ +}; + +struct f2fs_completion_result { + struct completion completion; + int res; +}; + +#define DECLARE_F2FS_COMPLETION_RESULT(ecr) \ + struct f2fs_completion_result ecr = { \ + COMPLETION_INITIALIZER((ecr).completion), 0 } + +static inline int f2fs_encryption_key_size(int mode) +{ + switch (mode) { + case F2FS_ENCRYPTION_MODE_AES_256_XTS: + return F2FS_AES_256_XTS_KEY_SIZE; + case F2FS_ENCRYPTION_MODE_AES_256_GCM: + return F2FS_AES_256_GCM_KEY_SIZE; + case F2FS_ENCRYPTION_MODE_AES_256_CBC: + return F2FS_AES_256_CBC_KEY_SIZE; + case F2FS_ENCRYPTION_MODE_AES_256_CTS: + return F2FS_AES_256_CTS_KEY_SIZE; + default: + BUG(); + } + return 0; +} + +#define F2FS_FNAME_NUM_SCATTER_ENTRIES 4 +#define F2FS_CRYPTO_BLOCK_SIZE 16 +#define F2FS_FNAME_CRYPTO_DIGEST_SIZE 32 + +/** + * For encrypted symlinks, the ciphertext length is stored at the beginning + * of the string in little-endian format. + */ +struct f2fs_encrypted_symlink_data { + __le16 len; + char encrypted_path[1]; +} __attribute__((__packed__)); + +/** + * This function is used to calculate the disk space required to + * store a filename of length l in encrypted symlink format. + */ +static inline u32 encrypted_symlink_data_len(u32 l) +{ + if (l < F2FS_CRYPTO_BLOCK_SIZE) + l = F2FS_CRYPTO_BLOCK_SIZE; + return (l + sizeof(struct f2fs_encrypted_symlink_data) - 1); +} +#endif /* _F2FS_CRYPTO_H */ diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index d44e97f..920408a 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -91,7 +91,9 @@ struct f2fs_super_block { __u8 version[VERSION_LEN]; /* the kernel version */ __u8 init_version[VERSION_LEN]; /* the initial kernel version */ __le32 feature; /* defined features */ - __u8 reserved[888]; /* valid reserved region */ + __u8 encryption_level; /* versioning level for encryption */ + __u8 encrypt_pw_salt[16]; /* Salt used for string2key algorithm */ + __u8 reserved[871]; /* valid reserved region */ } __packed; /*
This definitions will be used by inode and superblock for encyption. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> --- fs/f2fs/f2fs.h | 54 ++++++++++++++++++ fs/f2fs/f2fs_crypto.h | 149 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/f2fs_fs.h | 4 +- 3 files changed, 206 insertions(+), 1 deletion(-) create mode 100644 fs/f2fs/f2fs_crypto.h