mbox series

[0/2] fast-import input string handling bugs

Message ID 20190825080640.GA31453@sigill.intra.peff.net (mailing list archive)
Headers show
Series fast-import input string handling bugs | expand

Message

Jeff King Aug. 25, 2019, 8:06 a.m. UTC
On Sun, Aug 25, 2019 at 02:57:48AM -0400, Jeff King wrote:

> And I think this is actually a real bug in the current code! We keep a
> pointer to the encoding string, which survives because of the history.
> But that history is bounded, and we could have an indefinite number of
> changed files in the middle. If I modify t9300 like this:

Here are two patches. The first fixes the existing bug with "encoding",
and the second uses the approach I suggested to fix the leak you
noticed.

The second one does carry a greater risk of regression than your patch,
but I think it's worth it for the fact that it makes any other bugs
(like the "encoding" one) more obvious.

  [1/2]: fast-import: duplicate parsed encoding string
  [2/2]: fast-import: duplicate into history rather than passing ownership

 fast-import.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

-Peff

Comments

Junio C Hamano Aug. 26, 2019, 3:36 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> On Sun, Aug 25, 2019 at 02:57:48AM -0400, Jeff King wrote:
>
>> And I think this is actually a real bug in the current code! We keep a
>> pointer to the encoding string, which survives because of the history.
>> But that history is bounded, and we could have an indefinite number of
>> changed files in the middle. If I modify t9300 like this:
>
> Here are two patches. The first fixes the existing bug with "encoding",
> and the second uses the approach I suggested to fix the leak you
> noticed.
>
> The second one does carry a greater risk of regression than your patch,
> but I think it's worth it for the fact that it makes any other bugs
> (like the "encoding" one) more obvious.

Yeah, it may be worth the risk, given that this is quite early in
the cycle, so we have enough time to cook it in 'next' to see if
somebody screams ;-)

>
>   [1/2]: fast-import: duplicate parsed encoding string
>   [2/2]: fast-import: duplicate into history rather than passing ownership
>
>  fast-import.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> -Peff
Elijah Newren Aug. 26, 2019, 7:18 p.m. UTC | #2
On Sun, Aug 25, 2019 at 1:06 AM Jeff King <peff@peff.net> wrote:
>
> On Sun, Aug 25, 2019 at 02:57:48AM -0400, Jeff King wrote:
>
> > And I think this is actually a real bug in the current code! We keep a
> > pointer to the encoding string, which survives because of the history.
> > But that history is bounded, and we could have an indefinite number of
> > changed files in the middle. If I modify t9300 like this:
>
> Here are two patches. The first fixes the existing bug with "encoding",
> and the second uses the approach I suggested to fix the leak you
> noticed.
>
> The second one does carry a greater risk of regression than your patch,
> but I think it's worth it for the fact that it makes any other bugs
> (like the "encoding" one) more obvious.

I agree, both patches look good to me, and I particularly appreciate
some extra help to avoid making the same mistake again.  :-)

Just for good measure, I also went and tested these patches by running
the git filter-repo testsuite and by re-running the filter-repo timing
cases at https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/.

While filter-repo only uses a subset of fast-export functionality, it
tests it with a variety of weird and unusual tiny test repos.  And the
timing cases run on three real-world repositories (git, rails, and
linux).  Everything looks good on all of these testcases.