diff mbox

[v2,01/15] Documentation: add newcx initramfs format description

Message ID 1516850875-25066-2-git-send-email-takondra@cisco.com (mailing list archive)
State New, archived
Headers show

Commit Message

Taras Kondratiuk Jan. 25, 2018, 3:27 a.m. UTC
Many of the Linux security/integrity features are dependent on file
metadata, stored as extended attributes (xattrs), for making decisions.
These features need to be initialized during initcall and enabled as
early as possible for complete security coverage.

Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
support including them into the archive.

This patch describes "extended" newc format (newcx) that is based on
newc and has following changes:
- extended attributes support
- increased size of filesize to support files >4GB.
- increased mtime field size to have usec precision and more than
  32-bit of seconds.
- removed unused checksum field.

Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
---
 Documentation/early-userspace/buffer-format.txt | 46 ++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 5 deletions(-)

Comments

Arnd Bergmann Jan. 25, 2018, 9:29 a.m. UTC | #1
On Thu, Jan 25, 2018 at 4:27 AM, Taras Kondratiuk <takondra@cisco.com> wrote:
> Many of the Linux security/integrity features are dependent on file
> metadata, stored as extended attributes (xattrs), for making decisions.
> These features need to be initialized during initcall and enabled as
> early as possible for complete security coverage.
>
> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
> support including them into the archive.
>
> This patch describes "extended" newc format (newcx) that is based on
> newc and has following changes:
> - extended attributes support
> - increased size of filesize to support files >4GB.
> - increased mtime field size to have usec precision and more than
>   32-bit of seconds.
> - removed unused checksum field.
>
> Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
> Signed-off-by: Victor Kamensky <kamensky@cisco.com>

Ah nice, I like the extension of the time handling, that certainly
addresses one of the issues with y2038 that we have previously
hacked around in an ugly way (interpreting the 32-bit
number as unsigned).

However, if this is to become a generally supported format
for cpio files, could we make it use nanosecond resolution
instead? The issue that I see with microseconds is that
storing a file in an archive and extracting it again would
otherwise keep the mtime stamp /almost/ identical on file
systems that have nanosecond resolution, but most of
the time a comparison would indicate that the files are
not the same.

Unfortunately, the range of a 64-bit nanoseconds counter
is still a bit limited (584 years, or half of that if we make it
signed). While this is clearly enough for the uses in
initramfs, it still has a similar problem: someone creating
a fake timestamp a long time in the past or future on
a file system would lose information after going though
cpio.

         Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Taras Kondratiuk Jan. 25, 2018, 8:26 p.m. UTC | #2
Quoting Arnd Bergmann (2018-01-25 01:29:12)
> On Thu, Jan 25, 2018 at 4:27 AM, Taras Kondratiuk <takondra@cisco.com> wrote:
> > Many of the Linux security/integrity features are dependent on file
> > metadata, stored as extended attributes (xattrs), for making decisions.
> > These features need to be initialized during initcall and enabled as
> > early as possible for complete security coverage.
> >
> > Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
> > support including them into the archive.
> >
> > This patch describes "extended" newc format (newcx) that is based on
> > newc and has following changes:
> > - extended attributes support
> > - increased size of filesize to support files >4GB.
> > - increased mtime field size to have usec precision and more than
> >   32-bit of seconds.
> > - removed unused checksum field.
> >
> > Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
> > Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
> > Signed-off-by: Victor Kamensky <kamensky@cisco.com>
> 
> Ah nice, I like the extension of the time handling, that certainly
> addresses one of the issues with y2038 that we have previously
> hacked around in an ugly way (interpreting the 32-bit
> number as unsigned).
> 
> However, if this is to become a generally supported format
> for cpio files, could we make it use nanosecond resolution
> instead? The issue that I see with microseconds is that
> storing a file in an archive and extracting it again would
> otherwise keep the mtime stamp /almost/ identical on file
> systems that have nanosecond resolution, but most of
> the time a comparison would indicate that the files are
> not the same.
> 
> Unfortunately, the range of a 64-bit nanoseconds counter
> is still a bit limited (584 years, or half of that if we make it
> signed). While this is clearly enough for the uses in
> initramfs, it still has a similar problem: someone creating
> a fake timestamp a long time in the past or future on
> a file system would lose information after going though
> cpio.

We can match statx(2) by having 64 bits for seconds plus 32 bits for
nanoseconds. For initramfs nanoseconds field can be ignored during
unpacking.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Jan. 25, 2018, 9:02 p.m. UTC | #3
On Thu, Jan 25, 2018 at 9:26 PM, Taras Kondratiuk <takondra@cisco.com> wrote:
> Quoting Arnd Bergmann (2018-01-25 01:29:12)
>> On Thu, Jan 25, 2018 at 4:27 AM, Taras Kondratiuk <takondra@cisco.com> wrote:
>
> We can match statx(2) by having 64 bits for seconds plus 32 bits for
> nanoseconds.

Ok.

> For initramfs nanoseconds field can be ignored during
> unpacking.

That sounds like a pointless microoptimization. Most likely we won't ever
need the nanoseconds in the initramfs, but it's trivial to just copy them
into the right field, and not adding that one source line would probably
involve adding a one-line source comment to explain the omission ;-)

      Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Taras Kondratiuk Jan. 25, 2018, 10:13 p.m. UTC | #4
Quoting Arnd Bergmann (2018-01-25 13:02:49)
> On Thu, Jan 25, 2018 at 9:26 PM, Taras Kondratiuk <takondra@cisco.com> wrote:
> 
> > For initramfs nanoseconds field can be ignored during
> > unpacking.
> 
> That sounds like a pointless microoptimization. Most likely we won't ever
> need the nanoseconds in the initramfs, but it's trivial to just copy them
> into the right field, and not adding that one source line would probably
> involve adding a one-line source comment to explain the omission ;-)

Agree.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Landley Jan. 26, 2018, 2:39 a.m. UTC | #5
On 01/25/2018 03:29 AM, Arnd Bergmann wrote:
> On Thu, Jan 25, 2018 at 4:27 AM, Taras Kondratiuk <takondra@cisco.com> wrote:
>> Many of the Linux security/integrity features are dependent on file
>> metadata, stored as extended attributes (xattrs), for making decisions.
>> These features need to be initialized during initcall and enabled as
>> early as possible for complete security coverage.
>>
>> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
>> support including them into the archive.
>>
>> This patch describes "extended" newc format (newcx) that is based on
>> newc and has following changes:
>> - extended attributes support
>> - increased size of filesize to support files >4GB.
>> - increased mtime field size to have usec precision and more than
>>   32-bit of seconds.
>> - removed unused checksum field.
>>
>> Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
>> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
>> Signed-off-by: Victor Kamensky <kamensky@cisco.com>
> 
> Ah nice, I like the extension of the time handling, that certainly
> addresses one of the issues with y2038 that we have previously
> hacked around in an ugly way (interpreting the 32-bit
> number as unsigned).

Taras and I exchanged email like a year ago working out format stuff, so
I don't have any real complaints. My feedback's already worked in, and I
can make toybox cpio support -h newcx as soon as the format's finalized
and I get a free weekend.

That said, I don't think -h newcx should emit (or recognize) the
"TRAILER!!!1!" entry. That's kinda silly in-band signaling for 2018:
files have a length, pipes provide EOF, and each cpiox entry starts with
6 bytes of c_magic anyway. (I stopped toybox from producing the TRAILER
entry back in june, toybox commit 32550751997d, and the kernel consumes
the resulting cpio just fine. All the trailer does is prevent you from
concatenating cpio files, which is a feature multiple people asked me for.)

> However, if this is to become a generally supported format
> for cpio files,

After Joerg Schilling dies (or admits solaris has) it might even make it
into posix.

> could we make it use nanosecond resolution
> instead? The issue that I see with microseconds is that
> storing a file in an archive and extracting it again would
> otherwise keep the mtime stamp /almost/ identical on file
> systems that have nanosecond resolution, but most of
> the time a comparison would indicate that the files are
> not the same.

I have no strong opinion on this? The tmpfs is still going to track
nanoseconds, this is just rounding when it populates them.

> Unfortunately, the range of a 64-bit nanoseconds counter
> is still a bit limited (584 years, or half of that if we make it
> signed). While this is clearly enough for the uses in
> initramfs, it still has a similar problem: someone creating
> a fake timestamp a long time in the past or future on
> a file system would lose information after going though
> cpio.

Hence microseconds. This came up in email when we were talking about
this (like a year ago) and I decided I didn't care. :)

64 bits of microseconds is +- 584 centuries, while being accurate
enough[1] that making a getpid() syscall probably takes longer than that
on our highest end boxen, let alone doing a dentry lookup in the vfs
(even if it's hot in cache).

Rob

[1] Is future proofing an issue here? The s-curve of moore's law started
bending down around y2k back when Intel had to recall its 1.13ghz
pentium III for having overclocked its own chip at the factory, and it's
pretty darn flat these days. Clock speeds first hit 4ghz 15 years ago
and haven't been back, most of the work since 2005 has been about
parallelism, and recent performance improvements are once again going to
pentium 4 pipeline length levels of absurdity, as meltdown/spectre
demonstrates (140 instructions of prefetch!??!?). Maybe intel will make
9 nanometer manufacturing work, but atomic limits are already an issue.

The problem with 1 second timestamps was you honestly could confuse
"make" about which file was newer once an exec() could complete in the
same second having done real work. That was the motivating issue causing
the change, going to nanoseconds was just the big hammer of "this is
large enough it won't matter again in our lifetimes". But nanosecond
time stamps are recording more jitter than useful information, and that
seems unlikely to change this century?
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Landley Jan. 26, 2018, 2:40 a.m. UTC | #6
On 01/24/2018 09:27 PM, Taras Kondratiuk wrote:
> diff --git a/Documentation/early-userspace/buffer-format.txt b/Documentation/early-userspace/buffer-format.txt
> index e1fd7f9dad16..d818df4f72dc 100644
> --- a/Documentation/early-userspace/buffer-format.txt
> +++ b/Documentation/early-userspace/buffer-format.txt

> +compressed and/or uncompressed cpio archives; arbitrary amounts
> +zero bytes (for padding) can be added between members.

Missing "of" between amounts and zero. (Yeah it was in the original, but
if you're touching it anyway...)

> +c_xattrs_size  8 bytes		 Size of xattrs field
> +
> +Most of the fields match cpio_newc_header except c_mtime that contains
> +microseconds. c_chksum field is dropped.
> +
> +xattr_size is a total size of xattr_entry including 8 bytes of
> +xattr_size. xattr_size has the same hexadecimal ASCII encoding as other
> +fields of cpio header.

xattrs_size or xattr_size?

Total nitpicks, I know. :)

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Jan. 26, 2018, 9:04 a.m. UTC | #7
On Fri, Jan 26, 2018 at 3:39 AM, Rob Landley <rob@landley.net> wrote:

> The problem with 1 second timestamps was you honestly could confuse
> "make" about which file was newer once an exec() could complete in the
> same second having done real work. That was the motivating issue causing
> the change, going to nanoseconds was just the big hammer of "this is
> large enough it won't matter again in our lifetimes". But nanosecond
> time stamps are recording more jitter than useful information, and that
> seems unlikely to change this century?

Sure, the only thing we really need the nanosecond timestamp for is
to keep them identical. E.g. if you use cpio to make an exact copy
of a file system, using microseconds timestamps will round all mtime
values. If you then use 'rsync' to compare/update the two copies
without passing a --modify-window= or --size-only, it will have
to read all files in rather then skipping those with identical size and
mtime.

Side note: the default behavior for file systems is actually to only use
the coarse timestamps of the last timer tick, so you actually do get
identical timestamps in practice, plus six digits of nonsense:

(on tmpfs)
 $ for i in {000..999} ; do > $i ; done; stat --format="%y" *  | uniq -c
     86 2018-01-26 10:01:48.811135084 +0100
    469 2018-01-26 10:01:48.815135143 +0100
    445 2018-01-26 10:01:48.819135201 +0100

         Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Henrique de Moraes Holschuh Jan. 26, 2018, 10:31 a.m. UTC | #8
On Thu, 25 Jan 2018, Rob Landley wrote:
> That said, I don't think -h newcx should emit (or recognize) the
> "TRAILER!!!1!" entry. That's kinda silly in-band signaling for 2018:
> files have a length, pipes provide EOF, and each cpiox entry starts with
> 6 bytes of c_magic anyway. (I stopped toybox from producing the TRAILER
> entry back in june, toybox commit 32550751997d, and the kernel consumes
> the resulting cpio just fine. All the trailer does is prevent you from
> concatenating cpio files, which is a feature multiple people asked me for.)

Not in the kernel.  What TRAILER does in the kernel is to act as a
barrier for the hardlink creation state, which IS a good thing.  You
could just specify it as such for "newcx".

The kernel will continue reading for more entries after TRAILER, so
concatenation is not broken by TRAILER.  It is also insensitive to
NUL-padding length (as long as it is 4-byte aligned), which is another
nice feature you could specify for "newcx".

Also, the kernel does something nothing in userspace ever tried to,
AFAIK: it detects compression signatures along with the CPIO header
signatures, and thus it can take several compressed and uncompressed
archives concatenater together (and the compressor doesn't need to be
the same, either).
Victor Kamensky (kamensky) Jan. 26, 2018, 3:51 p.m. UTC | #9
On Fri, 26 Jan 2018, Henrique de Moraes Holschuh wrote:

> On Thu, 25 Jan 2018, Rob Landley wrote:
>> That said, I don't think -h newcx should emit (or recognize) the
>> "TRAILER!!!1!" entry. That's kinda silly in-band signaling for 2018:
>> files have a length, pipes provide EOF, and each cpiox entry starts with
>> 6 bytes of c_magic anyway.

My understanding that TRAILER is really used on tape devices,
there is no notion of file end in this case, it is just a stream of bytes
from char device.

Thanks,
Victor

>> (I stopped toybox from producing the TRAILER
>> entry back in june, toybox commit 32550751997d, and the kernel consumes
>> the resulting cpio just fine. All the trailer does is prevent you from
>> concatenating cpio files, which is a feature multiple people asked me for.)
>
> Not in the kernel.  What TRAILER does in the kernel is to act as a
> barrier for the hardlink creation state, which IS a good thing.  You
> could just specify it as such for "newcx".
>
> The kernel will continue reading for more entries after TRAILER, so
> concatenation is not broken by TRAILER.  It is also insensitive to
> NUL-padding length (as long as it is 4-byte aligned), which is another
> nice feature you could specify for "newcx".
>
> Also, the kernel does something nothing in userspace ever tried to,
> AFAIK: it detects compression signatures along with the CPIO header
> signatures, and thus it can take several compressed and uncompressed
> archives concatenater together (and the compressor doesn't need to be
> the same, either).
> --
>  Henrique Holschuh
>
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Henrique de Moraes Holschuh Jan. 26, 2018, 6:15 p.m. UTC | #10
On Fri, 26 Jan 2018, Victor Kamensky wrote:
> On Fri, 26 Jan 2018, Henrique de Moraes Holschuh wrote:
> > On Thu, 25 Jan 2018, Rob Landley wrote:
> > > That said, I don't think -h newcx should emit (or recognize) the
> > > "TRAILER!!!1!" entry. That's kinda silly in-band signaling for 2018:
> > > files have a length, pipes provide EOF, and each cpiox entry starts with
> > > 6 bytes of c_magic anyway.
> 
> My understanding that TRAILER is really used on tape devices,
> there is no notion of file end in this case, it is just a stream of bytes
> from char device.

TRAILER is really used anywhere you can have several cpio archives
concatenated, which is the exact case of a Linux initramfs, not just
tape.

The initramfs format takes *one or more* cpio archives, concatenated.
Each archive may be independently compressed (using whatever supported
compression method), or uncompressed[1].  EOF or size information can
only tell you where the entire concatenated archive ends, not where each
"segment" (independent cpio archive that was concatenated into the
whole) ends.

TRAILER is the only decent way to know the concatenation points.
Knowing where these points are is necessary for the kernel, due to the
way hardlink encoding is done on cpio archives: one has to reset the
state of the hardlink-tracking table between cpio archives that were
concatenated, for safety (and sysadmin sanity) reasons.

[1] for the special case when one includes an "early initramfs" section
for firmware (microcode, etc) updates, the archive(s) containing the
firmware data must be uncompressed, and these archives must come before
compressed archives in the concatenation.
Taras Kondratiuk Jan. 26, 2018, 9:02 p.m. UTC | #11
Quoting Rob Landley (2018-01-25 18:40:54)
> On 01/24/2018 09:27 PM, Taras Kondratiuk wrote:
> > diff --git a/Documentation/early-userspace/buffer-format.txt b/Documentation/early-userspace/buffer-format.txt
> > index e1fd7f9dad16..d818df4f72dc 100644
> > --- a/Documentation/early-userspace/buffer-format.txt
> > +++ b/Documentation/early-userspace/buffer-format.txt
> 
> > +compressed and/or uncompressed cpio archives; arbitrary amounts
> > +zero bytes (for padding) can be added between members.
> 
> Missing "of" between amounts and zero. (Yeah it was in the original, but
> if you're touching it anyway...)
> 
> > +c_xattrs_size  8 bytes                Size of xattrs field
> > +
> > +Most of the fields match cpio_newc_header except c_mtime that contains
> > +microseconds. c_chksum field is dropped.
> > +
> > +xattr_size is a total size of xattr_entry including 8 bytes of
> > +xattr_size. xattr_size has the same hexadecimal ASCII encoding as other
> > +fields of cpio header.
> 
> xattrs_size or xattr_size?
> 
> Total nitpicks, I know. :)

xattr_size here refers to size of each xattr_entry:
xattr_entry := xattr_size[8] + xattr_name + "\0" + xattr_value

I'll move this paragraph closer to that line.
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/early-userspace/buffer-format.txt b/Documentation/early-userspace/buffer-format.txt
index e1fd7f9dad16..d818df4f72dc 100644
--- a/Documentation/early-userspace/buffer-format.txt
+++ b/Documentation/early-userspace/buffer-format.txt
@@ -24,6 +24,7 @@  grammar, where:
 	+	indicates concatenation
 	GZIP()	indicates the gzip(1) of the operand
 	ALGN(n)	means padding with null bytes to an n-byte boundary
+	[n]	means size of field is n bytes
 
 	initramfs  := ("\0" | cpio_archive | cpio_gzip_archive)*
 
@@ -31,20 +32,29 @@  grammar, where:
 
 	cpio_archive := cpio_file* + (<nothing> | cpio_trailer)
 
-	cpio_file := ALGN(4) + cpio_header + filename + "\0" + ALGN(4) + data
+	cpio_file := (cpio_newc_file | cpio_newcx_file)
+
+	cpio_newc_file := ALGN(4) + cpio_newc_header + filename + "\0" + \
+			  ALGN(4) + data
+
+	cpio_newcx_file := ALGN(4) + cpio_newcx_header + filename + "\0" + \
+			   ALGN(4) + xattrs + ALGN(4) + data
+
+	xattrs := xattr_entry*
+
+	xattr_entry := xattr_size[8] + xattr_name + "\0" + xattr_value
 
 	cpio_trailer := ALGN(4) + cpio_header + "TRAILER!!!\0" + ALGN(4)
 
 
 In human terms, the initramfs buffer contains a collection of
-compressed and/or uncompressed cpio archives (in the "newc" or "crc"
-formats); arbitrary amounts zero bytes (for padding) can be added
-between members.
+compressed and/or uncompressed cpio archives; arbitrary amounts
+zero bytes (for padding) can be added between members.
 
 The cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is
 not ignored; see "handling of hard links" below.
 
-The structure of the cpio_header is as follows (all fields contain
+The structure of the cpio_newc_header is as follows (all fields contain
 hexadecimal ASCII numbers fully padded with '0' on the left to the
 full width of the field, for example, the integer 4780 is represented
 by the ASCII string "000012ac"):
@@ -81,6 +91,32 @@  algorithm used.
 If the filename is "TRAILER!!!" this is actually an end-of-archive
 marker; the c_filesize for an end-of-archive marker must be zero.
 
+"Extended" newc format (newcx)
+"newcx" cpio format extends "newc" by increasing size of some fields
+and adding extended attributes support. cpio_newcx_header structure:
+
+Field name    Field size	 Meaning
+c_magic	       6 bytes		 The string "070703"
+c_ino	       8 bytes		 File inode number
+c_mode	       8 bytes		 File mode and permissions
+c_uid	       8 bytes		 File uid
+c_gid	       8 bytes		 File gid
+c_nlink	       8 bytes		 Number of links
+c_mtime	      16 bytes		 Modification time (microseconds)
+c_filesize    16 bytes		 Size of data field
+c_maj	       8 bytes		 Major part of file device number
+c_min	       8 bytes		 Minor part of file device number
+c_rmaj	       8 bytes		 Major part of device node reference
+c_rmin	       8 bytes		 Minor part of device node reference
+c_namesize     8 bytes		 Length of filename, including final \0
+c_xattrs_size  8 bytes		 Size of xattrs field
+
+Most of the fields match cpio_newc_header except c_mtime that contains
+microseconds. c_chksum field is dropped.
+
+xattr_size is a total size of xattr_entry including 8 bytes of
+xattr_size. xattr_size has the same hexadecimal ASCII encoding as other
+fields of cpio header.
 
 *** Handling of hard links