diff mbox

[v3,01/15] Documentation: add newcx initramfs format description

Message ID 1518813234-5874-2-git-send-email-takondra@cisco.com (mailing list archive)
State New, archived
Headers show

Commit Message

Taras Kondratiuk Feb. 16, 2018, 8:33 p.m. UTC
Many of the Linux security/integrity features are dependent on file
metadata, stored as extended attributes (xattrs), for making decisions.
These features need to be initialized during initcall and enabled as
early as possible for complete security coverage.

Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
support including them into the archive.

This patch describes "extended" newc format (newcx) that is based on
newc and has following changes:
- extended attributes support
- increased size of filesize to support files >4GB
- increased mtime field size to have 64 bits of seconds and added a
  field for nanoseconds
- removed unused checksum field

Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
---
 Documentation/early-userspace/buffer-format.txt | 46 ++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 5 deletions(-)

Comments

H. Peter Anvin Feb. 16, 2018, 8:59 p.m. UTC | #1
On 02/16/18 12:33, Taras Kondratiuk wrote:
> Many of the Linux security/integrity features are dependent on file
> metadata, stored as extended attributes (xattrs), for making decisions.
> These features need to be initialized during initcall and enabled as
> early as possible for complete security coverage.
> 
> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
> support including them into the archive.
> 
> This patch describes "extended" newc format (newcx) that is based on
> newc and has following changes:
> - extended attributes support
> - increased size of filesize to support files >4GB
> - increased mtime field size to have 64 bits of seconds and added a
>   field for nanoseconds
> - removed unused checksum field
> 

If you are going to implement a new, non-backwards-compatible format,
you shouldn't replicate the mistakes of the current format.  Specifically:

1. The use of ASCII-encoded fixed-length numbers is an idiotic legacy
from an era before there were any portable way of dealing with numbers
with prespecified endianness.  If you are going to use ASCII, make them
delimited so that they don't have fixed limits, or just use binary.

The cpio header isn't fixed size, so that argument goes away, in fact
the only way to determine the end of the header is to scan forward.

2. Alignment sensitivity!  Because there is no header length
information, the above scan tells you where the header ends, but there
is padding before the data, and the size of that padding is only defined
by alignment.

3. Inband encoding of EOF: if you actually have a filename "TRAILER!!!"
you have problems.

But first, before you define a whole new format for which no tools exist
(you will have to work with the maintainers of the GNU tools to add
support) you should see how complex it would be to support the POSIX
tar/pax format, which already has all the features you are seeking, and
by now is well-supported.

	-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Landley Feb. 16, 2018, 9:25 p.m. UTC | #2
On 02/16/2018 02:59 PM, H. Peter Anvin wrote:
> On 02/16/18 12:33, Taras Kondratiuk wrote:
>> Many of the Linux security/integrity features are dependent on file
>> metadata, stored as extended attributes (xattrs), for making decisions.
>> These features need to be initialized during initcall and enabled as
>> early as possible for complete security coverage.
>>
>> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
>> support including them into the archive.
>>
>> This patch describes "extended" newc format (newcx) that is based on
>> newc and has following changes:
>> - extended attributes support
>> - increased size of filesize to support files >4GB
>> - increased mtime field size to have 64 bits of seconds and added a
>>   field for nanoseconds
>> - removed unused checksum field
>>
> 
> If you are going to implement a new, non-backwards-compatible format,
> you shouldn't replicate the mistakes of the current format.  Specifically:

So rather than make minimal changes to the existing format and continue to
support the existing format (sharing as much code as possible), you recommend
gratuitous aesthetic changes?

> 1. The use of ASCII-encoded fixed-length numbers is an idiotic legacy
> from an era before there were any portable way of dealing with numbers
> with prespecified endianness.

It lets encoders and decoders easily share code with the existing cpio format,
which we still intend to be able to read and write.

> If you are going to use ASCII, make them
> delimited so that they don't have fixed limits, or just use binary.

When it's gzipped this accomplishes what? (Other than being gratuitously
different from the previous iteration?)

> The cpio header isn't fixed size, so that argument goes away, in fact
> the only way to determine the end of the header is to scan forward.
> 
> 2. Alignment sensitivity!  Because there is no header length
> information, the above scan tells you where the header ends, but there
> is padding before the data, and the size of that padding is only defined
> by alignment.

Again, these are minimal changes to the existing cpio format. You're complaining
about _cpio_, and that the new stuff isn't _different_ enough from it.

> 3. Inband encoding of EOF: if you actually have a filename "TRAILER!!!"
> you have problems.

Been there, done that:

http://lkml.iu.edu/hypermail/linux/kernel/1801.3/01791.html

> But first, before you define a whole new format for which no tools exist
> (you will have to work with the maintainers of the GNU tools to add
> support)

No, he's been working with the maintainer of toybox to add support (for about a
year now), which gets him the Android command line. And the kernel has its own
built-in tool to generate cpio images anyway.

Why would anyone care what the GNU project thinks?

> you should see how complex it would be to support the POSIX
> tar/pax format,

That argument was had (at length) when initramfs went in over a decade ago.
There are links in Documentation/filesystems/ramfs-rootfs-initramfs.txt to the
mailing list entries about it.

> which already has all the features you are seeking, and
> by now is well-supported.

So... tar wasn't well-supported 15 years ago? (Hasn't the kernel source always
been distributed via tarball back since 0.0.1?)

You're suggesting having a whole second codepath that shares no code with the
existing cpio extractor. Are you suggesting abandoning support for the existing
initramfs.cpio.gz file format?

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Victor Kamensky (kamensky) Feb. 16, 2018, 9:47 p.m. UTC | #3
On Fri, 16 Feb 2018, Rob Landley wrote:

>
> On 02/16/2018 02:59 PM, H. Peter Anvin wrote:
>> On 02/16/18 12:33, Taras Kondratiuk wrote:
>>> Many of the Linux security/integrity features are dependent on file
>>> metadata, stored as extended attributes (xattrs), for making decisions.
>>> These features need to be initialized during initcall and enabled as
>>> early as possible for complete security coverage.
>>>
>>> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
>>> support including them into the archive.
>>>
>>> This patch describes "extended" newc format (newcx) that is based on
>>> newc and has following changes:
>>> - extended attributes support
>>> - increased size of filesize to support files >4GB
>>> - increased mtime field size to have 64 bits of seconds and added a
>>>   field for nanoseconds
>>> - removed unused checksum field
>>>
>>
>> If you are going to implement a new, non-backwards-compatible format,
>> you shouldn't replicate the mistakes of the current format.  Specifically:
>
> So rather than make minimal changes to the existing format and continue to
> support the existing format (sharing as much code as possible), you recommend
> gratuitous aesthetic changes?
>
>> 1. The use of ASCII-encoded fixed-length numbers is an idiotic legacy
>> from an era before there were any portable way of dealing with numbers
>> with prespecified endianness.
>
> It lets encoders and decoders easily share code with the existing cpio format,
> which we still intend to be able to read and write.
>
>> If you are going to use ASCII, make them
>> delimited so that they don't have fixed limits, or just use binary.
>
> When it's gzipped this accomplishes what? (Other than being gratuitously
> different from the previous iteration?)
>
>> The cpio header isn't fixed size, so that argument goes away, in fact
>> the only way to determine the end of the header is to scan forward.
>>
>> 2. Alignment sensitivity!  Because there is no header length
>> information, the above scan tells you where the header ends, but there
>> is padding before the data, and the size of that padding is only defined
>> by alignment.
>
> Again, these are minimal changes to the existing cpio format. You're complaining
> about _cpio_, and that the new stuff isn't _different_ enough from it.
>
>> 3. Inband encoding of EOF: if you actually have a filename "TRAILER!!!"
>> you have problems.
>
> Been there, done that:
>
> http://lkml.iu.edu/hypermail/linux/kernel/1801.3/01791.html
>
>> But first, before you define a whole new format for which no tools exist
>> (you will have to work with the maintainers of the GNU tools to add
>> support)
>
> No, he's been working with the maintainer of toybox to add support (for about a
> year now), which gets him the Android command line. And the kernel has its own
> built-in tool to generate cpio images anyway.
>
> Why would anyone care what the GNU project thinks?

In our internal use of this patch series we do use gnu cpio
to create initramfs.cpio.

And reference to gnu cpio patch that supports newcx format is
posted in description for this serieis:

https://raw.githubusercontent.com/victorkamensky/initramfs-xattrs-poky/rocko/meta/recipes-extended/cpio/cpio-2.12/cpio-xattrs.patch

Whether GNU cpio maintainers will accept it is different matter.
We will try, but we need to start somewhere and agree on
new format first.

Thanks,
Victor

>> you should see how complex it would be to support the POSIX
>> tar/pax format,
>
> That argument was had (at length) when initramfs went in over a decade ago.
> There are links in Documentation/filesystems/ramfs-rootfs-initramfs.txt to the
> mailing list entries about it.
>
>> which already has all the features you are seeking, and
>> by now is well-supported.
>
> So... tar wasn't well-supported 15 years ago? (Hasn't the kernel source always
> been distributed via tarball back since 0.0.1?)
>
> You're suggesting having a whole second codepath that shares no code with the
> existing cpio extractor. Are you suggesting abandoning support for the existing
> initramfs.cpio.gz file format?
>
> Rob
>
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin Feb. 17, 2018, midnight UTC | #4
On February 16, 2018 1:47:35 PM PST, Victor Kamensky <kamensky@cisco.com> wrote:
>
>
>On Fri, 16 Feb 2018, Rob Landley wrote:
>
>>
>> On 02/16/2018 02:59 PM, H. Peter Anvin wrote:
>>> On 02/16/18 12:33, Taras Kondratiuk wrote:
>>>> Many of the Linux security/integrity features are dependent on file
>>>> metadata, stored as extended attributes (xattrs), for making
>decisions.
>>>> These features need to be initialized during initcall and enabled
>as
>>>> early as possible for complete security coverage.
>>>>
>>>> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format
>does not
>>>> support including them into the archive.
>>>>
>>>> This patch describes "extended" newc format (newcx) that is based
>on
>>>> newc and has following changes:
>>>> - extended attributes support
>>>> - increased size of filesize to support files >4GB
>>>> - increased mtime field size to have 64 bits of seconds and added a
>>>>   field for nanoseconds
>>>> - removed unused checksum field
>>>>
>>>
>>> If you are going to implement a new, non-backwards-compatible
>format,
>>> you shouldn't replicate the mistakes of the current format. 
>Specifically:
>>
>> So rather than make minimal changes to the existing format and
>continue to
>> support the existing format (sharing as much code as possible), you
>recommend
>> gratuitous aesthetic changes?
>>
>>> 1. The use of ASCII-encoded fixed-length numbers is an idiotic
>legacy
>>> from an era before there were any portable way of dealing with
>numbers
>>> with prespecified endianness.
>>
>> It lets encoders and decoders easily share code with the existing
>cpio format,
>> which we still intend to be able to read and write.
>>
>>> If you are going to use ASCII, make them
>>> delimited so that they don't have fixed limits, or just use binary.
>>
>> When it's gzipped this accomplishes what? (Other than being
>gratuitously
>> different from the previous iteration?)
>>
>>> The cpio header isn't fixed size, so that argument goes away, in
>fact
>>> the only way to determine the end of the header is to scan forward.
>>>
>>> 2. Alignment sensitivity!  Because there is no header length
>>> information, the above scan tells you where the header ends, but
>there
>>> is padding before the data, and the size of that padding is only
>defined
>>> by alignment.
>>
>> Again, these are minimal changes to the existing cpio format. You're
>complaining
>> about _cpio_, and that the new stuff isn't _different_ enough from
>it.
>>
>>> 3. Inband encoding of EOF: if you actually have a filename
>"TRAILER!!!"
>>> you have problems.
>>
>> Been there, done that:
>>
>> http://lkml.iu.edu/hypermail/linux/kernel/1801.3/01791.html
>>
>>> But first, before you define a whole new format for which no tools
>exist
>>> (you will have to work with the maintainers of the GNU tools to add
>>> support)
>>
>> No, he's been working with the maintainer of toybox to add support
>(for about a
>> year now), which gets him the Android command line. And the kernel
>has its own
>> built-in tool to generate cpio images anyway.
>>
>> Why would anyone care what the GNU project thinks?
>
>In our internal use of this patch series we do use gnu cpio
>to create initramfs.cpio.
>
>And reference to gnu cpio patch that supports newcx format is
>posted in description for this serieis:
>
>https://raw.githubusercontent.com/victorkamensky/initramfs-xattrs-poky/rocko/meta/recipes-extended/cpio/cpio-2.12/cpio-xattrs.patch
>
>Whether GNU cpio maintainers will accept it is different matter.
>We will try, but we need to start somewhere and agree on
>new format first.
>
>Thanks,
>Victor
>
>>> you should see how complex it would be to support the POSIX
>>> tar/pax format,
>>
>> That argument was had (at length) when initramfs went in over a
>decade ago.
>> There are links in
>Documentation/filesystems/ramfs-rootfs-initramfs.txt to the
>> mailing list entries about it.
>>
>>> which already has all the features you are seeking, and
>>> by now is well-supported.
>>
>> So... tar wasn't well-supported 15 years ago? (Hasn't the kernel
>source always
>> been distributed via tarball back since 0.0.1?)
>>
>> You're suggesting having a whole second codepath that shares no code
>with the
>> existing cpio extractor. Are you suggesting abandoning support for
>the existing
>> initramfs.cpio.gz file format?
>>
>> Rob
>>

Introducing new, incompatible data formats is an inherently *very* costly operation; unfortunately many engineers don't seem to have a good grip of just *how* expensive it is (see "silly embedded nonsense hacks", "too little, too soon".)

Cpio itself is a great horror show of just how bad this gets: a bunch of minor tweaks without finding underlying design bugs resulting in a ton of mutually incompatible formats.  "They are almost the same" doesn't help: they are still incompatible.

Introducing a new incompatible data format without strong justification is engineering malpractice.  Doing it under the non-justification of expedience ("oh, we can share most of the code") is aggravated engineering malpractice.

It is entirely possible that the modern posix tar/pax format is too complex to be practical in this case – that would be justifying a new format.  But then you are taking the fundamental cost of breakage, and then the new format definitely should not be replicating known defects of another format and without at least some thought about how to avoid it in the future.
Taras Kondratiuk Feb. 17, 2018, 10:04 a.m. UTC | #5
Quoting hpa@zytor.com (2018-02-16 16:00:36)
> On February 16, 2018 1:47:35 PM PST, Victor Kamensky <kamensky@cisco.com> wrote:
> >
> >
> >On Fri, 16 Feb 2018, Rob Landley wrote:
> >
> >>
> >> On 02/16/2018 02:59 PM, H. Peter Anvin wrote:
> >>> On 02/16/18 12:33, Taras Kondratiuk wrote:
> >>>> Many of the Linux security/integrity features are dependent on file
> >>>> metadata, stored as extended attributes (xattrs), for making
> >decisions.
> >>>> These features need to be initialized during initcall and enabled
> >as
> >>>> early as possible for complete security coverage.
> >>>>
> >>>> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format
> >does not
> >>>> support including them into the archive.
> >>>>
> >>>> This patch describes "extended" newc format (newcx) that is based
> >on
> >>>> newc and has following changes:
> >>>> - extended attributes support
> >>>> - increased size of filesize to support files >4GB
> >>>> - increased mtime field size to have 64 bits of seconds and added a
> >>>>   field for nanoseconds
> >>>> - removed unused checksum field
> >>>>
> >>>
> >>> If you are going to implement a new, non-backwards-compatible
> >format,
> >>> you shouldn't replicate the mistakes of the current format. 
> >Specifically:
> >>
> >> So rather than make minimal changes to the existing format and
> >continue to
> >> support the existing format (sharing as much code as possible), you
> >recommend
> >> gratuitous aesthetic changes?
> >>
> >>> 1. The use of ASCII-encoded fixed-length numbers is an idiotic
> >legacy
> >>> from an era before there were any portable way of dealing with
> >numbers
> >>> with prespecified endianness.
> >>
> >> It lets encoders and decoders easily share code with the existing
> >cpio format,
> >> which we still intend to be able to read and write.
> >>
> >>> If you are going to use ASCII, make them
> >>> delimited so that they don't have fixed limits, or just use binary.
> >>
> >> When it's gzipped this accomplishes what? (Other than being
> >gratuitously
> >> different from the previous iteration?)
> >>
> >>> The cpio header isn't fixed size, so that argument goes away, in
> >fact
> >>> the only way to determine the end of the header is to scan forward.
> >>>
> >>> 2. Alignment sensitivity!  Because there is no header length
> >>> information, the above scan tells you where the header ends, but
> >there
> >>> is padding before the data, and the size of that padding is only
> >defined
> >>> by alignment.
> >>
> >> Again, these are minimal changes to the existing cpio format. You're
> >complaining
> >> about _cpio_, and that the new stuff isn't _different_ enough from
> >it.
> >>
> >>> 3. Inband encoding of EOF: if you actually have a filename
> >"TRAILER!!!"
> >>> you have problems.
> >>
> >> Been there, done that:
> >>
> >> http://lkml.iu.edu/hypermail/linux/kernel/1801.3/01791.html
> >>
> >>> But first, before you define a whole new format for which no tools
> >exist
> >>> (you will have to work with the maintainers of the GNU tools to add
> >>> support)
> >>
> >> No, he's been working with the maintainer of toybox to add support
> >(for about a
> >> year now), which gets him the Android command line. And the kernel
> >has its own
> >> built-in tool to generate cpio images anyway.
> >>
> >> Why would anyone care what the GNU project thinks?
> >
> >In our internal use of this patch series we do use gnu cpio
> >to create initramfs.cpio.
> >
> >And reference to gnu cpio patch that supports newcx format is
> >posted in description for this serieis:
> >
> >https://raw.githubusercontent.com/victorkamensky/initramfs-xattrs-poky/rocko/meta/recipes-extended/cpio/cpio-2.12/cpio-xattrs.patch
> >
> >Whether GNU cpio maintainers will accept it is different matter.
> >We will try, but we need to start somewhere and agree on
> >new format first.
> >
> >Thanks,
> >Victor
> >
> >>> you should see how complex it would be to support the POSIX
> >>> tar/pax format,
> >>
> >> That argument was had (at length) when initramfs went in over a
> >decade ago.
> >> There are links in
> >Documentation/filesystems/ramfs-rootfs-initramfs.txt to the
> >> mailing list entries about it.
> >>
> >>> which already has all the features you are seeking, and
> >>> by now is well-supported.
> >>
> >> So... tar wasn't well-supported 15 years ago? (Hasn't the kernel
> >source always
> >> been distributed via tarball back since 0.0.1?)
> >>
> >> You're suggesting having a whole second codepath that shares no code
> >with the
> >> existing cpio extractor. Are you suggesting abandoning support for
> >the existing
> >> initramfs.cpio.gz file format?
> >>
> >> Rob
> >>
> 
> Introducing new, incompatible data formats is an inherently *very* costly operation; unfortunately many engineers don't seem to have a good grip of just *how* expensive it is (see "silly embedded nonsense hacks", "too little, too soon".)
> 
> Cpio itself is a great horror show of just how bad this gets: a bunch of minor tweaks without finding underlying design bugs resulting in a ton of mutually incompatible formats.  "They are almost the same" doesn't help: they are still incompatible.
> 
> Introducing a new incompatible data format without strong justification is engineering malpractice.  Doing it under the non-justification of expedience ("oh, we can share most of the code") is aggravated engineering malpractice.
> 
> It is entirely possible that the modern posix tar/pax format is too complex to be practical in this case – that would be justifying a new format.  But then you are taking the fundamental cost of breakage, and then the new format definitely should not be replicating known defects of another format and without at least some thought about how to avoid it in the future.

I do understand a cost of adding a new format and I'd be very happy not
to do it if there is a better option. I did consider using tar/pax, but
looks like it was already discussed in 2001 between you and Al Viro [1]
and tar was rejected.

My main tar concerns:
- ustar+pax header is *huge*. E.g. directory entry in archive: pax 1536
  bytes vs cpio <200 bytes. Overall compressed initramfs size increase
  is not significant though.
- pax is not a strict format. E.g. xattrs may be stored under different
  names: SHCILY.xattr (GNU tar, star) vs LIBARCHIVE.xattr (libarchive).

I'm not sure which option is better. Adding tar to the kernel or adding
new cpio format into several tools (GNU cpio, libarchive, busybox,
toybox) will result in approximately the same amount of code.

It would be nice to get Al Viro's thoughts on this.

[1] https://web.archive.org/web/20060909041730/http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Landley Feb. 17, 2018, 5:32 p.m. UTC | #6
On 02/16/2018 06:00 PM, hpa@zytor.com wrote:
> Introducing new, incompatible data formats is an inherently *very*
> costly operation; unfortunately many engineers don't seem to have a good grip
> of just *how* expensive it is (see "silly embedded nonsense hacks", "too
> little, too soon".)

So your argument is we should use the _existing_ cpio format that supports xattrs?

You keep bringing up the embedded world as a thing you don't understand and is
thus bad. I remember when you dismissed "I would like to constrain my
cross-compiling dependencies to a minimal set" as a... what did you call it, a
silly academic exercise? (Googles...)

https://lkml.org/lkml/2008/2/15/548

> Cpio itself is a great horror show of just how bad this gets:

That's not what you said last time?

http://lkml.iu.edu/hypermail/linux/kernel/0112.2/1540.html

> Introducing a new incompatible data format without strong justification

Here's you suggesting a new format when initramfs first went in, because you
disliked _both_ tar and cpio:

http://lkml.iu.edu/hypermail/linux/kernel/0112.2/1587.html

Seriously, there is a "why cpio rather than tar" section of
https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt
with links to the messages. (www.uwsg became lkml in the links, I should submit
a patch fixing that, it redirected 6 months ago...)

We've _had_ this argument already. You are not bringing up _new_ arguments.

This patch set is because people want xattrs in initramfs. I still don't
personally understand why they want this, but they do. We need to still support
the existing file format for the forseeable future, and we might as well fix
y2038 while we're there (treating it as unsigned buys us a lot of decades, but
as long as we're bumping the version number anyway...).

Otherwise it tries to be the minimal set of changes to get us there. (My first
stab at this was dealing with sparse files, but runs of zeroes gzip pretty well
and tmpfs could always make itself sparse after the fact...)

> Doing it under the non-justification of expedience ("oh, we can share most>
> of the code") is aggravated engineering malpractice.

Coming from the guy who added perl as a build dependency to every project he
maintained simultaneously (the linux kernel, your bootloader, klibc), that seems
a lot more like an opinion than an objective metric.

> It is entirely possible that the modern posix tar/pax format is too complex

In the link above you declared it too complex in 2001. Partly because the gnu
tar and pax formats aren't really the same format.

> to be practical in this case – that would be justifying a new format.  But
> then you are taking the fundamental cost of breakage, and then the new format
> definitely should not be replicating known defects of another format and
> without at least some thought about how to avoid it in the future.

Didn't Linus flame more than one developer for ripping things out and replacing
them with a new untested thing rather than leaving a trail of breadcrumbs from a
known working thing to another known working thing? Or has the right way to do
it changed since the 2.5 development cycle?

Strangely the poor souls suffering under the burden of cpio to use initramfs
today haven't been screaming out their agony in a detectable way. (They're mad
the kernel doesn't give better feedback about why init failed to launch and it
either paniced or fell through to the fallback ROOT=, my patch to make
devtmpfs_mount work for initramfs was trying to fix the "you pointed the kernel
at a root filesystem directory which it cpio'd up but there was /dev/console in
it so your init has no stdin/stdout/stderr and dies immediately because of it"
problem. And the recent thread about "please don't add a third knob to make
initramfs be tmpfs instead of ramfs" was another corner case of that). And I
have half an INITRAMFS_VERBOSE patch around here somewhere to printk() a lot
more status (and I need to update the initramfs documentation I wrote to help
people have an easier time using it...)

But that's not about archive format. That's kernel userspace bringup being
persnickety. The silent majority you speak for on this archive format issue is
pretty darn silent.

Was this recorded as a problem for you before somebody suggested changing it? I
tend to be public about https://twitter.com/landley/status/964620648050982912
and collect links to other people's concerns when I notice...

Or is this just your opinion?

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mimi Zohar Feb. 18, 2018, 12:15 a.m. UTC | #7
On Fri, 2018-02-16 at 12:59 -0800, H. Peter Anvin wrote:
> On 02/16/18 12:33, Taras Kondratiuk wrote:
> > Many of the Linux security/integrity features are dependent on file
> > metadata, stored as extended attributes (xattrs), for making decisions.
> > These features need to be initialized during initcall and enabled as
> > early as possible for complete security coverage.
> > 
> > Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
> > support including them into the archive.
> > 
> > This patch describes "extended" newc format (newcx) that is based on
> > newc and has following changes:
> > - extended attributes support
> > - increased size of filesize to support files >4GB
> > - increased mtime field size to have 64 bits of seconds and added a
> >   field for nanoseconds
> > - removed unused checksum field
> > 
> 
> If you are going to implement a new, non-backwards-compatible format,
> you shouldn't replicate the mistakes of the current format.  Specifically:
> 
> 1. The use of ASCII-encoded fixed-length numbers is an idiotic legacy
> from an era before there were any portable way of dealing with numbers
> with prespecified endianness.  If you are going to use ASCII, make them
> delimited so that they don't have fixed limits, or just use binary.
> 
> The cpio header isn't fixed size, so that argument goes away, in fact
> the only way to determine the end of the header is to scan forward.
> 
> 2. Alignment sensitivity!  Because there is no header length
> information, the above scan tells you where the header ends, but there
> is padding before the data, and the size of that padding is only defined
> by alignment.
> 
> 3. Inband encoding of EOF: if you actually have a filename "TRAILER!!!"
> you have problems.
> 
> But first, before you define a whole new format for which no tools exist
> (you will have to work with the maintainers of the GNU tools to add
> support) you should see how complex it would be to support the POSIX
> tar/pax format, which already has all the features you are seeking, and
> by now is well-supported.

The discussion about including xattrs in the initramfs didn't start
yesterday.  It's been on the list of measurement/appraisal gaps that
need to be closed for years.  Initially I planned on using tar, but at
the 2014 Kernel Summit I spoke with Al at length.  At the time, he was
very clear that tar is unnecessarily overly complicated and
recommended extending CPIO.

I took his advice.  Unfortunately, as soon as I posted an initial
patch set to include xattrs in CPIO, all of the problems with CPIO had
to be addressed before defining a new CPIO number.  Unfortunately,
this wasn't the only measurement/appraisal gap that needed to be
addressed.  I've been working on closing other gaps.

I'm really happy that someone has taken the time to work on this.
 Instead of derailing their attempt of extending CPIO to include
xattrs, I'd appreciate your making constructive suggestions.

Mimi

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin Feb. 18, 2018, 12:24 a.m. UTC | #8
On February 17, 2018 4:15:12 PM PST, Mimi Zohar <zohar@linux.vnet.ibm.com> wrote:
>On Fri, 2018-02-16 at 12:59 -0800, H. Peter Anvin wrote:
>> On 02/16/18 12:33, Taras Kondratiuk wrote:
>> > Many of the Linux security/integrity features are dependent on file
>> > metadata, stored as extended attributes (xattrs), for making
>decisions.
>> > These features need to be initialized during initcall and enabled
>as
>> > early as possible for complete security coverage.
>> > 
>> > Initramfs (tmpfs) supports xattrs, but newc CPIO archive format
>does not
>> > support including them into the archive.
>> > 
>> > This patch describes "extended" newc format (newcx) that is based
>on
>> > newc and has following changes:
>> > - extended attributes support
>> > - increased size of filesize to support files >4GB
>> > - increased mtime field size to have 64 bits of seconds and added a
>> >   field for nanoseconds
>> > - removed unused checksum field
>> > 
>> 
>> If you are going to implement a new, non-backwards-compatible format,
>> you shouldn't replicate the mistakes of the current format. 
>Specifically:
>> 
>> 1. The use of ASCII-encoded fixed-length numbers is an idiotic legacy
>> from an era before there were any portable way of dealing with
>numbers
>> with prespecified endianness.  If you are going to use ASCII, make
>them
>> delimited so that they don't have fixed limits, or just use binary.
>> 
>> The cpio header isn't fixed size, so that argument goes away, in fact
>> the only way to determine the end of the header is to scan forward.
>> 
>> 2. Alignment sensitivity!  Because there is no header length
>> information, the above scan tells you where the header ends, but
>there
>> is padding before the data, and the size of that padding is only
>defined
>> by alignment.
>> 
>> 3. Inband encoding of EOF: if you actually have a filename
>"TRAILER!!!"
>> you have problems.
>> 
>> But first, before you define a whole new format for which no tools
>exist
>> (you will have to work with the maintainers of the GNU tools to add
>> support) you should see how complex it would be to support the POSIX
>> tar/pax format, which already has all the features you are seeking,
>and
>> by now is well-supported.
>
>The discussion about including xattrs in the initramfs didn't start
>yesterday.  It's been on the list of measurement/appraisal gaps that
>need to be closed for years.  Initially I planned on using tar, but at
>the 2014 Kernel Summit I spoke with Al at length.  At the time, he was
>very clear that tar is unnecessarily overly complicated and
>recommended extending CPIO.
>
>I took his advice.  Unfortunately, as soon as I posted an initial
>patch set to include xattrs in CPIO, all of the problems with CPIO had
>to be addressed before defining a new CPIO number.  Unfortunately,
>this wasn't the only measurement/appraisal gap that needed to be
>addressed.  I've been working on closing other gaps.
>
>I'm really happy that someone has taken the time to work on this.
> Instead of derailing their attempt of extending CPIO to include
>xattrs, I'd appreciate your making constructive suggestions.
>
>Mimi

I'm not trying to derail anything, but I do want to see it done well if it is going to be done.  I have several ideas already, but I may not have a chance to write them down properly until after the weekend due to family obligations.

The assessment of pax/tar is useful; it should be added to the patch documentation in a future set.
H. Peter Anvin Feb. 18, 2018, 12:26 a.m. UTC | #9
On February 17, 2018 4:15:12 PM PST, Mimi Zohar <zohar@linux.vnet.ibm.com> wrote:
>On Fri, 2018-02-16 at 12:59 -0800, H. Peter Anvin wrote:
>> On 02/16/18 12:33, Taras Kondratiuk wrote:
>> > Many of the Linux security/integrity features are dependent on file
>> > metadata, stored as extended attributes (xattrs), for making
>decisions.
>> > These features need to be initialized during initcall and enabled
>as
>> > early as possible for complete security coverage.
>> > 
>> > Initramfs (tmpfs) supports xattrs, but newc CPIO archive format
>does not
>> > support including them into the archive.
>> > 
>> > This patch describes "extended" newc format (newcx) that is based
>on
>> > newc and has following changes:
>> > - extended attributes support
>> > - increased size of filesize to support files >4GB
>> > - increased mtime field size to have 64 bits of seconds and added a
>> >   field for nanoseconds
>> > - removed unused checksum field
>> > 
>> 
>> If you are going to implement a new, non-backwards-compatible format,
>> you shouldn't replicate the mistakes of the current format. 
>Specifically:
>> 
>> 1. The use of ASCII-encoded fixed-length numbers is an idiotic legacy
>> from an era before there were any portable way of dealing with
>numbers
>> with prespecified endianness.  If you are going to use ASCII, make
>them
>> delimited so that they don't have fixed limits, or just use binary.
>> 
>> The cpio header isn't fixed size, so that argument goes away, in fact
>> the only way to determine the end of the header is to scan forward.
>> 
>> 2. Alignment sensitivity!  Because there is no header length
>> information, the above scan tells you where the header ends, but
>there
>> is padding before the data, and the size of that padding is only
>defined
>> by alignment.
>> 
>> 3. Inband encoding of EOF: if you actually have a filename
>"TRAILER!!!"
>> you have problems.
>> 
>> But first, before you define a whole new format for which no tools
>exist
>> (you will have to work with the maintainers of the GNU tools to add
>> support) you should see how complex it would be to support the POSIX
>> tar/pax format, which already has all the features you are seeking,
>and
>> by now is well-supported.
>
>The discussion about including xattrs in the initramfs didn't start
>yesterday.  It's been on the list of measurement/appraisal gaps that
>need to be closed for years.  Initially I planned on using tar, but at
>the 2014 Kernel Summit I spoke with Al at length.  At the time, he was
>very clear that tar is unnecessarily overly complicated and
>recommended extending CPIO.
>
>I took his advice.  Unfortunately, as soon as I posted an initial
>patch set to include xattrs in CPIO, all of the problems with CPIO had
>to be addressed before defining a new CPIO number.  Unfortunately,
>this wasn't the only measurement/appraisal gap that needed to be
>addressed.  I've been working on closing other gaps.
>
>I'm really happy that someone has taken the time to work on this.
> Instead of derailing their attempt of extending CPIO to include
>xattrs, I'd appreciate your making constructive suggestions.
>
>Mimi

Do you have a description of the gaps you have identified?
Mimi Zohar Feb. 18, 2018, 12:47 a.m. UTC | #10
On Sat, 2018-02-17 at 16:26 -0800, hpa@zytor.com wrote:

> Do you have a description of the gaps you have identified?

Probably the 2016 Linux Security Summit (LSS) integrity status update
has the best list.

http://events17.linuxfoundation.org/sites/events/files/slides/LSS2016-
LinuxIntegritySubsystemStatus.pdf

Mimi

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/early-userspace/buffer-format.txt b/Documentation/early-userspace/buffer-format.txt
index e1fd7f9dad16..8349b2f28b4b 100644
--- a/Documentation/early-userspace/buffer-format.txt
+++ b/Documentation/early-userspace/buffer-format.txt
@@ -24,6 +24,7 @@  grammar, where:
 	+	indicates concatenation
 	GZIP()	indicates the gzip(1) of the operand
 	ALGN(n)	means padding with null bytes to an n-byte boundary
+	[n]	means size of field is n bytes
 
 	initramfs  := ("\0" | cpio_archive | cpio_gzip_archive)*
 
@@ -31,20 +32,33 @@  grammar, where:
 
 	cpio_archive := cpio_file* + (<nothing> | cpio_trailer)
 
-	cpio_file := ALGN(4) + cpio_header + filename + "\0" + ALGN(4) + data
+	cpio_file := (cpio_newc_file | cpio_newcx_file)
+
+	cpio_newc_file := ALGN(4) + cpio_newc_header + filename + "\0" + \
+			  ALGN(4) + data
+
+	cpio_newcx_file := ALGN(4) + cpio_newcx_header + filename + "\0" + \
+			   ALGN(4) + xattrs + ALGN(4) + data
+
+	xattrs := xattr_entry*
+
+	xattr_entry := xattr_size[8] + xattr_name + "\0" + xattr_value
 
 	cpio_trailer := ALGN(4) + cpio_header + "TRAILER!!!\0" + ALGN(4)
 
 
 In human terms, the initramfs buffer contains a collection of
-compressed and/or uncompressed cpio archives (in the "newc" or "crc"
-formats); arbitrary amounts zero bytes (for padding) can be added
-between members.
+compressed and/or uncompressed cpio archives; arbitrary amounts
+zero bytes (for padding) can be added between members.
 
 The cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is
 not ignored; see "handling of hard links" below.
 
-The structure of the cpio_header is as follows (all fields contain
+xattr_size is a total size of xattr_entry including 8 bytes of
+xattr_size. xattr_size has the same hexadecimal ASCII encoding as other
+fields of cpio header (see below).
+
+The structure of the cpio_newc_header is as follows (all fields contain
 hexadecimal ASCII numbers fully padded with '0' on the left to the
 full width of the field, for example, the integer 4780 is represented
 by the ASCII string "000012ac"):
@@ -81,6 +95,28 @@  algorithm used.
 If the filename is "TRAILER!!!" this is actually an end-of-archive
 marker; the c_filesize for an end-of-archive marker must be zero.
 
+"Extended" newc format (newcx)
+"newcx" cpio format extends "newc" by increasing size of some fields
+and adding extended attributes support. cpio_newcx_header structure:
+
+Field name    Field size	 Meaning
+c_magic	       6 bytes		 The string "070703"
+c_ino	       8 bytes		 File inode number
+c_mode	       8 bytes		 File mode and permissions
+c_uid	       8 bytes		 File uid
+c_gid	       8 bytes		 File gid
+c_nlink	       8 bytes		 Number of links
+c_mtime	      16 bytes		 Modification time (seconds)
+c_mtime_nsec   8 bytes		 Modification time (nanoseconds)
+c_filesize    16 bytes		 Size of data field
+c_maj	       8 bytes		 Major part of file device number
+c_min	       8 bytes		 Minor part of file device number
+c_rmaj	       8 bytes		 Major part of device node reference
+c_rmin	       8 bytes		 Minor part of device node reference
+c_namesize     8 bytes		 Length of filename, including final \0
+c_xattrs_size  8 bytes		 Size of xattrs field
+
+Most of the fields match cpio_newc_header. c_chksum field is dropped.
 
 *** Handling of hard links