Message ID | cover.1606324509.git.me@ttaylorr.com (mailing list archive) |
---|---|
Headers | show |
Series | midx: prevent against racily disappearing packs | expand |
Taylor Blau <me@ttaylorr.com> writes: > Here are a couple of patches that Peff and I wrote after noticing a > problem where racily disappearing .idx files can cause a segfault. While > investigating, we fixed a related issue which is described and fixed in > the second patch. In the cover letter it won't affect the end result, but when talking about "race", it always is a good idea to explicitly mention both sides of the race. It is clear what one side is in the above (i.e. somebody who removes .idx file that is still in use), but it is not so clear who gets hit and segfaults. I am guessing that the other party is the user of .pack file(s) bypassing the corresponding .idx file(s) because the necessary data is mostly in .midx? Thanks.
On Wed, Nov 25, 2020 at 01:13:31PM -0800, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > Here are a couple of patches that Peff and I wrote after noticing a > > problem where racily disappearing .idx files can cause a segfault. While > > investigating, we fixed a related issue which is described and fixed in > > the second patch. > > In the cover letter it won't affect the end result, but when talking > about "race", it always is a good idea to explicitly mention both > sides of the race. It is clear what one side is in the above > (i.e. somebody who removes .idx file that is still in use), but it > is not so clear who gets hit and segfaults. > > I am guessing that the other party is the user of .pack file(s) > bypassing the corresponding .idx file(s) because the necessary data > is mostly in .midx? Yeah, the race reproduction in the second commit message can actually reproduce the segfault as well (it depends on the exact timing which error you get). So the segfault is in the reader, who is not checking the result of find_revindex_entry(). Arguably every call there should be checking for NULL, but in practice I think it would always be a bug: - we were somehow unable to open the index in order to generate the revindex (which is what happened here). But I think we are better off making sure that we can always do so, which is what this series does. - the caller asked about an object at a position beyond the number of objects in the packfile. This is a bug in the caller. So we could perhaps BUG() in find_revindex_entry() instead of returning NULL. A quick segfault accomplishes mostly the same thing, though the BUG() could distinguish the two cases more clearly. -Peff
On Wed, Nov 25, 2020 at 07:48:54PM -0500, Jeff King wrote: > Yeah, the race reproduction in the second commit message can actually > reproduce the segfault as well (it depends on the exact timing which > error you get). So the segfault is in the reader, who is not checking > the result of find_revindex_entry(). > > Arguably every call there should be checking for NULL, but in practice > I think it would always be a bug: > > - we were somehow unable to open the index in order to generate the > revindex (which is what happened here). But I think we are better > off making sure that we can always do so, which is what this series > does. > > - the caller asked about an object at a position beyond the number of > objects in the packfile. This is a bug in the caller. > > So we could perhaps BUG() in find_revindex_entry() instead of returning > NULL. A quick segfault accomplishes mostly the same thing, though the > BUG() could distinguish the two cases more clearly. Yeah, a find_revindex_entry() that returns NULL means that the caller is probably dead in the water. FWIW, this function gets touched by a series that I'm working on here: [1]. There, I think "returning NULL" is equivalent to "returning -1", and the problem exists there, too. We could return a different negative number, call BUG(), or do nothing other than what's written. I don't have any strong feelings, though. > -Peff Thanks, Taylor [1]: https://github.com/ttaylorr/git/blob/tb/on-disk-revindex-part-one/pack-revindex.c#L177-L201
On Wed, Nov 25, 2020 at 08:04:44PM -0500, Taylor Blau wrote: > > So we could perhaps BUG() in find_revindex_entry() instead of returning > > NULL. A quick segfault accomplishes mostly the same thing, though the > > BUG() could distinguish the two cases more clearly. > > Yeah, a find_revindex_entry() that returns NULL means that the caller is > probably dead in the water. > > FWIW, this function gets touched by a series that I'm working on here: > [1]. There, I think "returning NULL" is equivalent to "returning -1", > and the problem exists there, too. > > We could return a different negative number, call BUG(), or do nothing > other than what's written. I don't have any strong feelings, though. Yeah, I was suggesting _never_ returning a failure, and just hitting a BUG() within the function. So it does not matter then how you represent the error return type, because there isn't one. :) Looking over the callers, there are actually a few that check the return value and handle it sanely. I suspect they can never trigger in practice, given that the other callers would all segfault, and we have not seen any reports in the 15 years during which that has been the case. But perhaps some of them can be triggered by bogus pack data that nobody has ever run into in the real world. At any rate, I am content to ignore it until somebody feels like digging into each caller. A BUG() is only marginally better than an immediate segfault anyway, and I'd prefer not to disrupt more substantive improvements in the area. -Peff