libelf: Fix div0 issues in elf_{shdr, phdr}_count()

Message ID	1481206738-10702-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Andrew Cooper <andrew.cooper3@citrix.com> To: Xen-devel <xen-devel@lists.xen.org> Date: Thu, 8 Dec 2016 14:18:58 +0000 Message-ID: <1481206738-10702-1-git-send-email-andrew.cooper3@citrix.com> MIME-Version: 1.0 Cc: Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wei.liu2@citrix.com>, George Dunlap <George.Dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>, Jan Beulich <JBeulich@suse.com> Subject: [Xen-devel] [PATCH] libelf: Fix div0 issues in elf_{shdr, phdr}_count() Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Andrew Cooper Dec. 8, 2016, 2:18 p.m. UTC

elf_uval() can return zero either because the field itself is zero, or because
the access is out of bounds.

c/s a01b6d4 "libelf: treat phdr and shdr similarly" introduced two div0 issues
as e_{ph,sh}entsize are not checked for sanity before being used to divide
elf->size.

Spotted by Coverity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>

I experimented with making elf_access_unsigned() __must_check, but this didn't
cause a compiler error.  I am not quite sure why.
---
 xen/common/libelf/libelf-tools.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Jan Beulich Dec. 8, 2016, 2:41 p.m. UTC | #1

>>> On 08.12.16 at 15:18, <andrew.cooper3@citrix.com> wrote:
> elf_uval() can return zero either because the field itself is zero, or because
> the access is out of bounds.
> 
> c/s a01b6d4 "libelf: treat phdr and shdr similarly" introduced two div0 issues
> as e_{ph,sh}entsize are not checked for sanity before being used to divide
> elf->size.
> 
> Spotted by Coverity.

And wrongly so, imo.

> --- a/xen/common/libelf/libelf-tools.c
> +++ b/xen/common/libelf/libelf-tools.c
> @@ -130,11 +130,17 @@ uint64_t elf_round_up(struct elf_binary *elf, uint64_t addr)
>  unsigned elf_shdr_count(struct elf_binary *elf)
>  {
>      unsigned count = elf_uval(elf, elf->ehdr, e_shnum);
> +    unsigned entsize = elf_uval(elf, elf->ehdr, e_shentsize);
>      uint64_t max;
>  
>      if ( !count )
>          return 0;
> -    max = elf->size / elf_uval(elf, elf->ehdr, e_shentsize);
> +    if ( !entsize )
> +    {
> +        elf_mark_broken(elf, "e_shentsize is zero");
> +        return 0;
> +    }

This as well as ...

> @@ -148,11 +154,17 @@ unsigned elf_shdr_count(struct elf_binary *elf)
>  unsigned elf_phdr_count(struct elf_binary *elf)
>  {
>      unsigned count = elf_uval(elf, elf->ehdr, e_phnum);
> +    unsigned entsize = elf_uval(elf, elf->ehdr, e_phentsize);
>      uint64_t max;
>  
>      if ( !count )
>          return 0;
> -    max = elf->size / elf_uval(elf, elf->ehdr, e_phentsize);
> +    if ( !entsize )
> +    {
> +        elf_mark_broken(elf, "e_phentsize is zero");
> +        return 0;
> +    }

... this would end up being dead code, due to the checks the same
patch you refer to introduced in elf_init().

Jan

Andrew Cooper Dec. 8, 2016, 2:46 p.m. UTC | #2

On 08/12/16 14:41, Jan Beulich wrote:
>>>> On 08.12.16 at 15:18, <andrew.cooper3@citrix.com> wrote:
>> elf_uval() can return zero either because the field itself is zero, or because
>> the access is out of bounds.
>>
>> c/s a01b6d4 "libelf: treat phdr and shdr similarly" introduced two div0 issues
>> as e_{ph,sh}entsize are not checked for sanity before being used to divide
>> elf->size.
>>
>> Spotted by Coverity.
> And wrongly so, imo.
>
>> --- a/xen/common/libelf/libelf-tools.c
>> +++ b/xen/common/libelf/libelf-tools.c
>> @@ -130,11 +130,17 @@ uint64_t elf_round_up(struct elf_binary *elf, uint64_t addr)
>>  unsigned elf_shdr_count(struct elf_binary *elf)
>>  {
>>      unsigned count = elf_uval(elf, elf->ehdr, e_shnum);
>> +    unsigned entsize = elf_uval(elf, elf->ehdr, e_shentsize);
>>      uint64_t max;
>>  
>>      if ( !count )
>>          return 0;
>> -    max = elf->size / elf_uval(elf, elf->ehdr, e_shentsize);
>> +    if ( !entsize )
>> +    {
>> +        elf_mark_broken(elf, "e_shentsize is zero");
>> +        return 0;
>> +    }
> This as well as ...
>
>> @@ -148,11 +154,17 @@ unsigned elf_shdr_count(struct elf_binary *elf)
>>  unsigned elf_phdr_count(struct elf_binary *elf)
>>  {
>>      unsigned count = elf_uval(elf, elf->ehdr, e_phnum);
>> +    unsigned entsize = elf_uval(elf, elf->ehdr, e_phentsize);
>>      uint64_t max;
>>  
>>      if ( !count )
>>          return 0;
>> -    max = elf->size / elf_uval(elf, elf->ehdr, e_phentsize);
>> +    if ( !entsize )
>> +    {
>> +        elf_mark_broken(elf, "e_phentsize is zero");
>> +        return 0;
>> +    }
> ... this would end up being dead code, due to the checks the same
> patch you refer to introduced in elf_init().

Are you sure?  elf_init() currently looks like this:

    /* Sanity check phdr if present. */
    count = elf_phdr_count(elf);
    if ( count )
    {
        if ( elf_uval(elf, elf->ehdr, e_phentsize) <
             elf_size(elf, ELF_HANDLE_DECL(elf_phdr)) )
        {
            elf_err(elf, "ELF: phdr too small (%" PRIu64 ")\n",
                    elf_uval(elf, elf->ehdr, e_phentsize));
            return -1;
        }

There is no check of e_phentsize before it is used to divide with.

~Andrew

Ian Jackson Dec. 8, 2016, 2:48 p.m. UTC | #3

Jan Beulich writes ("Re: [PATCH] libelf: Fix div0 issues in elf_{shdr,phdr}_count()"):
> On 08.12.16 at 15:18, <andrew.cooper3@citrix.com> wrote:
> > Spotted by Coverity.
> 
> And wrongly so, imo.
...
> > @@ -148,11 +154,17 @@ unsigned elf_shdr_count(struct elf_binary *elf)
> >  unsigned elf_phdr_count(struct elf_binary *elf)
> >  {
> >      unsigned count = elf_uval(elf, elf->ehdr, e_phnum);
> > +    unsigned entsize = elf_uval(elf, elf->ehdr, e_phentsize);
...
> ... this would end up being dead code, due to the checks the same
> patch you refer to introduced in elf_init().

I think I would prefer code that is obviously correct from local
inspection.  There is no performance implication here.

Maybe what we want is elf_divide() which implments anything/0 as 0.

Ian.

Jan Beulich Dec. 8, 2016, 3:17 p.m. UTC | #4

>>> On 08.12.16 at 15:46, <andrew.cooper3@citrix.com> wrote:
> On 08/12/16 14:41, Jan Beulich wrote:
>>>>> On 08.12.16 at 15:18, <andrew.cooper3@citrix.com> wrote:
>>> elf_uval() can return zero either because the field itself is zero, or 
> because
>>> the access is out of bounds.
>>>
>>> c/s a01b6d4 "libelf: treat phdr and shdr similarly" introduced two div0 
> issues
>>> as e_{ph,sh}entsize are not checked for sanity before being used to divide
>>> elf->size.
>>>
>>> Spotted by Coverity.
>> And wrongly so, imo.
>>
>>> --- a/xen/common/libelf/libelf-tools.c
>>> +++ b/xen/common/libelf/libelf-tools.c
>>> @@ -130,11 +130,17 @@ uint64_t elf_round_up(struct elf_binary *elf, uint64_t 
> addr)
>>>  unsigned elf_shdr_count(struct elf_binary *elf)
>>>  {
>>>      unsigned count = elf_uval(elf, elf->ehdr, e_shnum);
>>> +    unsigned entsize = elf_uval(elf, elf->ehdr, e_shentsize);
>>>      uint64_t max;
>>>  
>>>      if ( !count )
>>>          return 0;
>>> -    max = elf->size / elf_uval(elf, elf->ehdr, e_shentsize);
>>> +    if ( !entsize )
>>> +    {
>>> +        elf_mark_broken(elf, "e_shentsize is zero");
>>> +        return 0;
>>> +    }
>> This as well as ...
>>
>>> @@ -148,11 +154,17 @@ unsigned elf_shdr_count(struct elf_binary *elf)
>>>  unsigned elf_phdr_count(struct elf_binary *elf)
>>>  {
>>>      unsigned count = elf_uval(elf, elf->ehdr, e_phnum);
>>> +    unsigned entsize = elf_uval(elf, elf->ehdr, e_phentsize);
>>>      uint64_t max;
>>>  
>>>      if ( !count )
>>>          return 0;
>>> -    max = elf->size / elf_uval(elf, elf->ehdr, e_phentsize);
>>> +    if ( !entsize )
>>> +    {
>>> +        elf_mark_broken(elf, "e_phentsize is zero");
>>> +        return 0;
>>> +    }
>> ... this would end up being dead code, due to the checks the same
>> patch you refer to introduced in elf_init().
> 
> Are you sure?  elf_init() currently looks like this:
> 
>     /* Sanity check phdr if present. */
>     count = elf_phdr_count(elf);
>     if ( count )
>     {
>         if ( elf_uval(elf, elf->ehdr, e_phentsize) <
>              elf_size(elf, ELF_HANDLE_DECL(elf_phdr)) )
>         {
>             elf_err(elf, "ELF: phdr too small (%" PRIu64 ")\n",
>                     elf_uval(elf, elf->ehdr, e_phentsize));
>             return -1;
>         }
> 
> There is no check of e_phentsize before it is used to divide with.

Oh, I see - elf_phdr_count() itself relies on the check that's
about to be done. But I think the correct thing then would be to
not use elf_phdr_count() here, but read the raw field instead.
You patch basically adds a weaker check there which is then
immediately being followed by the stronger check above.

And looking at it from that angle it's then questionable why
elf_{p,s}hdr_count() do any checking at all - this should all be
done once in elf_init(). IOW I did the adjustment in the wrong
way, and I really should have made elf_shdr_count() match
elf_phdr_count() (moving the checks into elf_init()).

And looking at elf_init() again I also realize that I blindly
copied the range checks, calculation of which can overflow.
I think it would be better to do those checks such that
overflow results in an error return.

I wonder if it wouldn't be better to revert that change, for
me to produce a better v2.

Jan

Andrew Cooper Dec. 8, 2016, 3:23 p.m. UTC | #5

On 08/12/16 15:17, Jan Beulich wrote:
>>>> On 08.12.16 at 15:46, <andrew.cooper3@citrix.com> wrote:
>> On 08/12/16 14:41, Jan Beulich wrote:
>>>>>> On 08.12.16 at 15:18, <andrew.cooper3@citrix.com> wrote:
>>>> elf_uval() can return zero either because the field itself is zero, or 
>> because
>>>> the access is out of bounds.
>>>>
>>>> c/s a01b6d4 "libelf: treat phdr and shdr similarly" introduced two div0 
>> issues
>>>> as e_{ph,sh}entsize are not checked for sanity before being used to divide
>>>> elf->size.
>>>>
>>>> Spotted by Coverity.
>>> And wrongly so, imo.
>>>
>>>> --- a/xen/common/libelf/libelf-tools.c
>>>> +++ b/xen/common/libelf/libelf-tools.c
>>>> @@ -130,11 +130,17 @@ uint64_t elf_round_up(struct elf_binary *elf, uint64_t 
>> addr)
>>>>  unsigned elf_shdr_count(struct elf_binary *elf)
>>>>  {
>>>>      unsigned count = elf_uval(elf, elf->ehdr, e_shnum);
>>>> +    unsigned entsize = elf_uval(elf, elf->ehdr, e_shentsize);
>>>>      uint64_t max;
>>>>  
>>>>      if ( !count )
>>>>          return 0;
>>>> -    max = elf->size / elf_uval(elf, elf->ehdr, e_shentsize);
>>>> +    if ( !entsize )
>>>> +    {
>>>> +        elf_mark_broken(elf, "e_shentsize is zero");
>>>> +        return 0;
>>>> +    }
>>> This as well as ...
>>>
>>>> @@ -148,11 +154,17 @@ unsigned elf_shdr_count(struct elf_binary *elf)
>>>>  unsigned elf_phdr_count(struct elf_binary *elf)
>>>>  {
>>>>      unsigned count = elf_uval(elf, elf->ehdr, e_phnum);
>>>> +    unsigned entsize = elf_uval(elf, elf->ehdr, e_phentsize);
>>>>      uint64_t max;
>>>>  
>>>>      if ( !count )
>>>>          return 0;
>>>> -    max = elf->size / elf_uval(elf, elf->ehdr, e_phentsize);
>>>> +    if ( !entsize )
>>>> +    {
>>>> +        elf_mark_broken(elf, "e_phentsize is zero");
>>>> +        return 0;
>>>> +    }
>>> ... this would end up being dead code, due to the checks the same
>>> patch you refer to introduced in elf_init().
>> Are you sure?  elf_init() currently looks like this:
>>
>>     /* Sanity check phdr if present. */
>>     count = elf_phdr_count(elf);
>>     if ( count )
>>     {
>>         if ( elf_uval(elf, elf->ehdr, e_phentsize) <
>>              elf_size(elf, ELF_HANDLE_DECL(elf_phdr)) )
>>         {
>>             elf_err(elf, "ELF: phdr too small (%" PRIu64 ")\n",
>>                     elf_uval(elf, elf->ehdr, e_phentsize));
>>             return -1;
>>         }
>>
>> There is no check of e_phentsize before it is used to divide with.
> Oh, I see - elf_phdr_count() itself relies on the check that's
> about to be done. But I think the correct thing then would be to
> not use elf_phdr_count() here, but read the raw field instead.
> You patch basically adds a weaker check there which is then
> immediately being followed by the stronger check above.
>
> And looking at it from that angle it's then questionable why
> elf_{p,s}hdr_count() do any checking at all - this should all be
> done once in elf_init(). IOW I did the adjustment in the wrong
> way, and I really should have made elf_shdr_count() match
> elf_phdr_count() (moving the checks into elf_init()).
>
> And looking at elf_init() again I also realize that I blindly
> copied the range checks, calculation of which can overflow.
> I think it would be better to do those checks such that
> overflow results in an error return.
>
> I wonder if it wouldn't be better to revert that change, for
> me to produce a better v2.

Feel free.

I have to admit that I find it odd that the values aren't sanitised
once, then copied out into local good state.  The repeated use of
elf_uval() risks a lot of zeroes because of its range check case.

~Andrew

Ian Jackson Dec. 8, 2016, 3:47 p.m. UTC | #6

Andrew Cooper writes ("Re: [PATCH] libelf: Fix div0 issues in elf_{shdr,phdr}_count()"):
> On 08/12/16 15:17, Jan Beulich wrote:
> > Oh, I see - elf_phdr_count() itself relies on the check that's
> > about to be done. But I think the correct thing then would be to
> > not use elf_phdr_count() here, but read the raw field instead.
> > You patch basically adds a weaker check there which is then
> > immediately being followed by the stronger check above.

There must be no "reading of raw fields".  You may use
elf_access_unsigned if you really want to.

> > And looking at it from that angle it's then questionable why
> > elf_{p,s}hdr_count() do any checking at all - this should all be
> > done once in elf_init(). IOW I did the adjustment in the wrong
> > way, and I really should have made elf_shdr_count() match
> > elf_phdr_count() (moving the checks into elf_init()).

The point of this checking is not to avoid overflow, but just to
ensure that the loops which rely on elf_*_count do not iterate "far
too much".

> > And looking at elf_init() again I also realize that I blindly
> > copied the range checks, calculation of which can overflow.

These possibly-faulty range checks are harmless because they all
operate on unsigned values.

Frankly I'm not sure what the point of a01b6d46 was ?

> > I think it would be better to do those checks such that
> > overflow results in an error return.

The design principle behind my fixes to XSA-55 is to write the bulk of
libelf in a subset of C which is always safe.

This means:

 * Always using unsigned integers (so no signed integer overflow).

 * Doing all pointer dereferences into the ELF block using an
   access function which does a bounds check.  (And this also
   means representing all pointers as unsigned offsets, so that we
   avoid calculating basilisk pointer values.)

 * Explicitly limiting the iteration count of every loop where
   necessayr (to avoid DOS attacks from bad images).

 * Not using unsafe operations like division by input-controlled
   values.

Sticking to these rules is a lot easier than writing explicit checks.
This is because these explicit checks are very easy to omit, or get
wrong.  The pointers are all encapsulated in a special type which
prevents uncontrolled dereference.

Admittedly the rule about iteration count is not so easy to remember,
and I had failed to anticipate that someone would introduce division.
But both of those kind of bugs are denial of service, rather than
privilege escalation.

Having stared at the commit message of a01b6d46 and at the
pre-a01b6d46 code, I still don't understand what motivated the
changes.

Ian.

Jan Beulich Dec. 8, 2016, 4:09 p.m. UTC | #7

>>> On 08.12.16 at 16:47, <ian.jackson@eu.citrix.com> wrote:
> Andrew Cooper writes ("Re: [PATCH] libelf: Fix div0 issues in 
> elf_{shdr,phdr}_count()"):
>> On 08/12/16 15:17, Jan Beulich wrote:
>> > Oh, I see - elf_phdr_count() itself relies on the check that's
>> > about to be done. But I think the correct thing then would be to
>> > not use elf_phdr_count() here, but read the raw field instead.
>> > You patch basically adds a weaker check there which is then
>> > immediately being followed by the stronger check above.
> 
> There must be no "reading of raw fields".  You may use
> elf_access_unsigned if you really want to.

Oh, by "raw" I meant using elf_uval() rather than elf_phdr_count().

>> > And looking at it from that angle it's then questionable why
>> > elf_{p,s}hdr_count() do any checking at all - this should all be
>> > done once in elf_init(). IOW I did the adjustment in the wrong
>> > way, and I really should have made elf_shdr_count() match
>> > elf_phdr_count() (moving the checks into elf_init()).
> 
> The point of this checking is not to avoid overflow, but just to
> ensure that the loops which rely on elf_*_count do not iterate "far
> too much".
> 
>> > And looking at elf_init() again I also realize that I blindly
>> > copied the range checks, calculation of which can overflow.
> 
> These possibly-faulty range checks are harmless because they all
> operate on unsigned values.

Why does operating on unsigned values make overflow harmless?

> Frankly I'm not sure what the point of a01b6d46 was ?

First of all I'd like to refer you to its description, but see below.
Are you suggesting that all those adjustments appear pointless
to you?

>  * Not using unsafe operations like division by input-controlled
>    values.

But using a hard coded but possibly wrong value instead isn't
really a solution.

> Having stared at the commit message of a01b6d46 and at the
> pre-a01b6d46 code, I still don't understand what motivated the
> changes.

Hmm, that's interesting. Are you then saying all the changes it
did and described are wrong? Pointless? I could do with some
clarification here, as otherwise it's not clear whether spending
time on producing an improved v2 is actually worth it.

Jan

Ian Jackson Dec. 8, 2016, 5:28 p.m. UTC | #8

Ian Jackson writes ("Re: [PATCH] libelf: Fix div0 issues in elf_{shdr,phdr}_count()"):
> Admittedly the rule about iteration count is not so easy to remember,
> and I had failed to anticipate that someone would introduce division.
> But both of those kind of bugs are denial of service, rather than
> privilege escalation.
> 
> Having stared at the commit message of a01b6d46 and at the
> pre-a01b6d46 code, I still don't understand what motivated the
> changes.

Jan and I had a conversation on irc.


Part of the motivation for a01b6d4 was the anomaly which is the
difference between the implementations of elf_phdr_size and
elf_shdr_size.

This was introduced in 39bf7b9d0ae5 "libelf: check loops for running
away".  That was part of the XSA-55 patchset.

Looking at the commit message I was concerned that the phdr and shdr
counts might be excessive, and that libelf might get stuck in a loop
with an unreasonably high iteration count.

I appear to have attempted to mitigate this by:

 (a) in the case of shdrs, by using elf_access_ok inside each
    loop, on the principle that an out of control loop count
    would walk off the end of the image and stop;

 (b) in the case of phdrs, making an explicit limit of
    (image size) / sizeof(phdr).  NB that sizeof(phdr) is not the
    actual size of phdrs; for a valid ELF it is a minimum.  This
    is fine because we're computing a backstop, which does not
    have to be entirely accurate.

I now see that (a) is ineffective because the image can specify its
own stride for the iteration (the header size).  If it specifies a
stride of zero, the maximum count is unbounded.

However, both count values are actually 16-bit.  So there is already a
limit.  The extra code in elf_phdr_size is not necessary.

IMO this near-miss demonstrates that a more robust approach is
required to bounding loops in libelf.

A possibility would be to introduce
  bool elf_iter_cont(struct elf_binary *elf, size_t copysz);
and using it in every for(;;) in libelf.  It would keep a counter (set
to zero in libelf) and abandon the processing if the total of all the
copysz values passed was "too large".  (copysz would be bounded below
by 1, so that it is safe to pass a size from the ELF.)

Then

-     for ( i = 0; i < elf_shdr_count(elf); i++ )
+     for ( i = 0; elf_iter_cont(elf, e_shentsize)
+          && i < elf_shdr_count(elf); i++ )

An alternative would be to decorate every loop with a comment
explaining why the loop does reasonably-bounded work.  I'm not sure
whether this, or my more sophisticated suggestion, will find favour.

In any case, the max calculation in elf_phdr_size is unnecessary for
security, and has no purpose related to functionality, and can be
deleted.


Another part of the motivation for a01b6d4 is to produce better
messages for certain malformed ELFs.  There has been no assertion that
any such ELFs (ie ELFs which would fail these new checks) actually
exist or have caused trouble for anyone.

I think that it is a bad idea to add unnecessary checking to libelf.

libelf is a format parser on a security boundary.  Despite attempts to
provide a safe dialect to write in, every piece of code in libelf
risks security problems.

If libelf "accepts" some malformed image, and constructs a mess in the
guest memory space, then that is unfortunate.  But it is not a big
problem.  The guest will probably crash or go wrong, but that is not a
problem for the host.

If someone is faced with fallout from a malformed ELF, they would
hopefully try use an object file inspector like objdump on the loaded
image.  That would reveal the problem.  (And of course ELFs are mostly
generated by a fairly small ecosystem of toolchain programs.)


Also, this has demonstrated that the design principles underlying the
safety of libelf are not properly understood.  IIRC they were
explained in the XSA-55 00/ patch, but that's not sufficient.


So I would prefer to:


Revert all of a01b6d4.


Delete the header count check in elf_phdr_count and replace it with a
compile-time assertion, in elf_phdr_count and in elf_shdr_count, that
sizeof the count field is <= 2.  (This may need a new macro
ELF_HANDLE_FIELD_SIZEOF implemented in terms of
ELF__HANDLE_FIELD_TYPE.)


Add a comment near the top of libelf.h explaining the rules for
programming inside libelf, to include:

  * Arithmetic on signed values is forbidden

  * Use of actual pointers into the ELF is forbidden

  * Division by non-constant values is forbidden

  * All loops must ensure that they have a reasonably low
    iteration count

  * Code (including ELF format sanity checks) which is necessary
    neither for safety nor for correct processing of correct files
    should not be added to libelf.  It is not libelf's role to
    be an ELF validator.


Thanks,
Ian.

Jan Beulich Dec. 9, 2016, 8:38 a.m. UTC | #9

>>> On 08.12.16 at 18:28, <ian.jackson@eu.citrix.com> wrote:
> Part of the motivation for a01b6d4 was the anomaly which is the
> difference between the implementations of elf_phdr_size and
> elf_shdr_size.
> 
> This was introduced in 39bf7b9d0ae5 "libelf: check loops for running
> away".  That was part of the XSA-55 patchset.
> 
> Looking at the commit message I was concerned that the phdr and shdr
> counts might be excessive, and that libelf might get stuck in a loop
> with an unreasonably high iteration count.

Looking at that commit message, what I'm missing first of all is an
explanation of why long loops are a problem in the first place. Do
we need to be concerned of single threaded tool stacks? I think
such a problem would better be addressed at higher layers,
switching to multi threaded designs.

> I appear to have attempted to mitigate this by:
> 
>  (a) in the case of shdrs, by using elf_access_ok inside each
>     loop, on the principle that an out of control loop count
>     would walk off the end of the image and stop;
> 
>  (b) in the case of phdrs, making an explicit limit of
>     (image size) / sizeof(phdr).  NB that sizeof(phdr) is not the
>     actual size of phdrs; for a valid ELF it is a minimum.  This
>     is fine because we're computing a backstop, which does not
>     have to be entirely accurate.
> 
> I now see that (a) is ineffective because the image can specify its
> own stride for the iteration (the header size).  If it specifies a
> stride of zero, the maximum count is unbounded.

Well, a stride of zero is invalid (as is any smaller than the base ELF
spec's entry structure size).

> However, both count values are actually 16-bit.  So there is already a
> limit.  The extra code in elf_phdr_size is not necessary.

I'll make v2 of the patch then simply drop it.

> Another part of the motivation for a01b6d4 is to produce better
> messages for certain malformed ELFs.  There has been no assertion that
> any such ELFs (ie ELFs which would fail these new checks) actually
> exist or have caused trouble for anyone.
> 
> I think that it is a bad idea to add unnecessary checking to libelf.
> 
> libelf is a format parser on a security boundary.  Despite attempts to
> provide a safe dialect to write in, every piece of code in libelf
> risks security problems.

Any crash (e.g. as a result of a signal) of course is a problem also
for a multi threaded tool stack. But beyond that I'm not sure I
understand why you say "every piece". That's perhaps partly
because ...

> Also, this has demonstrated that the design principles underlying the
> safety of libelf are not properly understood.  IIRC they were
> explained in the XSA-55 00/ patch, but that's not sufficient.

... this explanation has not been recorded anywhere afaics, not
even in xsa.git.

> So I would prefer to:
> 
> Revert all of a01b6d4.

That was already done yesterday.

> Delete the header count check in elf_phdr_count and replace it with a
> compile-time assertion, in elf_phdr_count and in elf_shdr_count, that
> sizeof the count field is <= 2.  (This may need a new macro
> ELF_HANDLE_FIELD_SIZEOF implemented in terms of
> ELF__HANDLE_FIELD_TYPE.)

I'll wait with adding such an assertion until we've clarified the point
near the top of this reply.

> Add a comment near the top of libelf.h explaining the rules for
> programming inside libelf, to include:
> 
>   * Arithmetic on signed values is forbidden
> 
>   * Use of actual pointers into the ELF is forbidden
> 
>   * Division by non-constant values is forbidden
> 
>   * All loops must ensure that they have a reasonably low
>     iteration count
> 
>   * Code (including ELF format sanity checks) which is necessary
>     neither for safety nor for correct processing of correct files
>     should not be added to libelf.  It is not libelf's role to
>     be an ELF validator.

For all these points, I'd like it to also be clarified why those are
being established. I'd appreciate if you could submit a respective
patch based on the 00/ patch you refer to above, which I
understand you still have available in your mailbox.

For this last point - "ELF validator" is certainly the goal. But I
think it is reasonable for a library to at least check the input it
makes use of. If I had meant to fully validate the ELF image,
I would e.g. have had to reject images with zero e_phnum but
non-zero other e_ph* fields. Yet we don't care about those
other fields when there are no program headers (which, as
pointed out on IRC, is useless anyway, as then there's nothing
to load; only e_shnum can sensibly be zero).

Jan

Ian Jackson Dec. 9, 2016, 11:54 a.m. UTC | #10

Jan Beulich writes ("Re: [PATCH] libelf: Fix div0 issues in elf_{shdr,phdr}_count()"):
> On 08.12.16 at 18:28, <ian.jackson@eu.citrix.com> wrote:
> > Looking at the commit message I was concerned that the phdr and shdr
> > counts might be excessive, and that libelf might get stuck in a loop
> > with an unreasonably high iteration count.
> 
> Looking at that commit message, what I'm missing first of all is an
> explanation of why long loops are a problem in the first place. Do
> we need to be concerned of single threaded tool stacks? I think
> such a problem would better be addressed at higher layers,
> switching to multi threaded designs.

libxl is (essentially) singlethreaded.  It is very hard to write
correct multithreaded programs and I would be adamantly opposed to
anything that made libxl more thready inside.  Also, even with
threading, a long-running domain setup would make it difficult to kill
the domain, unless we have asynchronous termination of such a thread,
which would be a nightmare to implement correctly.

> > I now see that (a) is ineffective because the image can specify its
> > own stride for the iteration (the header size).  If it specifies a
> > stride of zero, the maximum count is unbounded.
> 
> Well, a stride of zero is invalid (as is any smaller than the base ELF
> spec's entry structure size).

That something is invalid does not mean it cannot be specified.

> > libelf is a format parser on a security boundary.  Despite attempts to
> > provide a safe dialect to write in, every piece of code in libelf
> > risks security problems.
> 
> Any crash (e.g. as a result of a signal) of course is a problem also
> for a multi threaded tool stack. But beyond that I'm not sure I
> understand why you say "every piece". That's perhaps partly
> because ...

All code risks bugs.

The programming environment in which libelf code is written is by far
from guaranteed safe.  Safety relies on programmers who write the code
following the rules.  The most common kinds of mistake are rendered
harmless, but other kinds of mistake are possible.

> > Add a comment near the top of libelf.h explaining the rules for
> > programming inside libelf, to include:
> > 
> >   * Arithmetic on signed values is forbidden
> > 
> >   * Use of actual pointers into the ELF is forbidden
> > 
> >   * Division by non-constant values is forbidden
> > 
> >   * All loops must ensure that they have a reasonably low
> >     iteration count
> > 
> >   * Code (including ELF format sanity checks) which is necessary
> >     neither for safety nor for correct processing of correct files
> >     should not be added to libelf.  It is not libelf's role to
> >     be an ELF validator.
> 
> For all these points, I'd like it to also be clarified why those are
> being established. I'd appreciate if you could submit a respective
> patch based on the 00/ patch you refer to above, which I
> understand you still have available in your mailbox.

I will do this.

> For this last point - "ELF validator" is certainly the goal. But I
> think it is reasonable for a library to at least check the input it
> makes use of. If I had meant to fully validate the ELF image,
> I would e.g. have had to reject images with zero e_phnum but
> non-zero other e_ph* fields. Yet we don't care about those
> other fields when there are no program headers (which, as
> pointed out on IRC, is useless anyway, as then there's nothing
> to load; only e_shnum can sensibly be zero).

(I think you omitted "not", so you meant "is certainly not the goal.)

IMO libelf should do enough tests to function correctly with correct
input, and avoid being vulnerable to incorrect input.  Code to perform
other tests introduces risk (of bugs, including security bugs such as
the one introduced in a01b6d4).

Such code has almost zero benefit because 1. we do not expect
ELF-generating tools to generate invalid ELFs so these checks' failure
paths will very likely never be executed anywhere, and 2. anyone
debugging such a hypothetical bad ELF will have a variety of tools
available which will do a much better job of analysing it.

Ian.

Jan Beulich Dec. 9, 2016, 1:03 p.m. UTC | #11

>>> On 09.12.16 at 12:54, <ian.jackson@eu.citrix.com> wrote:
> Jan Beulich writes ("Re: [PATCH] libelf: Fix div0 issues in 
> elf_{shdr,phdr}_count()"):
>> On 08.12.16 at 18:28, <ian.jackson@eu.citrix.com> wrote:
>> > Looking at the commit message I was concerned that the phdr and shdr
>> > counts might be excessive, and that libelf might get stuck in a loop
>> > with an unreasonably high iteration count.
>> 
>> Looking at that commit message, what I'm missing first of all is an
>> explanation of why long loops are a problem in the first place. Do
>> we need to be concerned of single threaded tool stacks? I think
>> such a problem would better be addressed at higher layers,
>> switching to multi threaded designs.
> 
> libxl is (essentially) singlethreaded.  It is very hard to write
> correct multithreaded programs and I would be adamantly opposed to
> anything that made libxl more thready inside.  Also, even with
> threading, a long-running domain setup would make it difficult to kill
> the domain, unless we have asynchronous termination of such a thread,
> which would be a nightmare to implement correctly.

But libxl is only a library - the question is whether xl (or whatever
containing process) may be sitting in on libelf invocation,
preventing the process(es) controlling another domain(s) from
making forward progress.

As to killing the domain - wouldn't killing the xl process doing the
creation followed by an "xl destroy" be sufficient? (As you know,
I don't know very much about how the tool stack works, but this
would seem a pretty natural pair of operations.)

>> > I now see that (a) is ineffective because the image can specify its
>> > own stride for the iteration (the header size).  If it specifies a
>> > stride of zero, the maximum count is unbounded.
>> 
>> Well, a stride of zero is invalid (as is any smaller than the base ELF
>> spec's entry structure size).
> 
> That something is invalid does not mean it cannot be specified.

And I didn't mean to imply that. What I mean to imply is that if
we guarded against too small (and hence invalid) entry size values
(which is one of the things the reverted patch does), we wouldn't
have this problem.

>> For this last point - "ELF validator" is certainly the goal. But I
>> think it is reasonable for a library to at least check the input it
>> makes use of. If I had meant to fully validate the ELF image,
>> I would e.g. have had to reject images with zero e_phnum but
>> non-zero other e_ph* fields. Yet we don't care about those
>> other fields when there are no program headers (which, as
>> pointed out on IRC, is useless anyway, as then there's nothing
>> to load; only e_shnum can sensibly be zero).
> 
> (I think you omitted "not", so you meant "is certainly not the goal.)

Indeed, thanks for noticing and deducing the right thing.

> IMO libelf should do enough tests to function correctly with correct
> input, and avoid being vulnerable to incorrect input.  Code to perform
> other tests introduces risk (of bugs, including security bugs such as
> the one introduced in a01b6d4).
> 
> Such code has almost zero benefit because 1. we do not expect
> ELF-generating tools to generate invalid ELFs so these checks' failure
> paths will very likely never be executed anywhere, and 2. anyone
> debugging such a hypothetical bad ELF will have a variety of tools
> available which will do a much better job of analysing it.

Well, in that case I'll have to teach myself to keep my hands off of
libelf/ as much as possible. But as you can see from the entry size
aspect further up, some extra checks may help (and might even
allow to weaken your "don't allow division by ..." criteria).

Jan

Ian Jackson Dec. 9, 2016, 3:44 p.m. UTC | #12

We recently discovered two near-miss in libelf:

* The intended method for limiting the phdr loop iteration count was
  not effective.  But happily this turned not to be important because
  the count field is only 16 bits.

* A recent commit accidentally introduced a division by zero
  vulnerability.

Subsequent discussion revealed that the design principles underlying
libelf's safety were not widely understood - because they were not
documented.

Initially I tried dealing with the loop safety problem by auditing the
code and adding a suitable comment next to each loop, stating a proof
sketch of the loop's safety.  I found that this quickly became
unworkable, because there are nested loops.  These nested loops did
not have badly unreasonable upper bounds but the complexity of the
analysis was unsuitable for security-critical review.

An upper bound on the work done in loops in libelf is necessary
because libelf may be called by the toolstack in a context where it
would block other work.  Specifically, libelf is called by libxl, and
libxl does all of its work within a single per-ctx lock.  libxl's
callers are not supposed to be required to invoke libxl on multiple
ctxs or with multiple processes simultaneously, and in any case
we don't want to generate and leak stuck toolstack processes.

So, in this series, I propose:

 * A new scheme for limiting the work done by libelf.  We track it
   explicitly, and check it on each iteration of every loop.  (This
   replaces a similar ad-hoc scheme used for copying image data.)

 * Documentation which states the safety design principles for libelf,
   and the coding rules which follow from those design principles.

After this series is done there are a few redundant loop safety
checks, from the previous approach:

 * There are a number of ad-hoc limits on string sizes, certain table
   sizes, etc.

 * There are calls to elf_access_ok which were intended to limit loop
   iteration counts (but are ineffective at doing so since the stride
   is controlled by the input image and might be zero).

I have chosen to retain these.  Removing them seems like an
unnecessary risk.  In particular, searching for and removing
certain elf_access_ok calls seems unwise.

Thanks,
Ian.

libelf: Fix div0 issues in elf_{shdr, phdr}_count()

Commit Message

Comments

Patch