Message ID | 20250207-245-partially-atomic-ref-updates-v1-0-e6a3690ff23a@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | refs: introduce support for partial reference transactions | expand |
Hi Karthik On 07/02/2025 07:34, Karthik Nayak wrote: > Git's reference updates are traditionally atomic I'm nitpicking but the updates aren't actually atomic, if a transaction updates two refs then it is possible for another process to see the one ref pointing to the new value and the other pointing to the old value. > - when updating > multiple references in a transaction, either all updates succeed or none > do. While this behavior is generally desirable, Isn't that the whole point of transactions? > it can be limiting in> certain scenarios, particularly with the reftable backend where batching > multiple reference updates is more efficient than performing them > sequentially. > > This series introduces support for partial reference transactions, > allowing individual reference updates to fail while letting others > proceed. This sounds like it's abusing ref transactions to implement a performance optimization. I wonder if it would be better to provide that via a different interface than shares the same underling implementation as transactions. That would make it clear to someone reading the code that individual ref updates can fail without affecting the rest. Burying that detail in a flag makes it rather easy to miss. Best Wishes Phillip > This capability is exposed through git-update-ref's > `--allow-partial` flag, which can be used in `--stdin` mode to batch > updates and handle failures gracefully. > The changes are structured to carefully build up this functionality: > > First, we clean up and consolidate the reference update checking logic. > This includes removing duplicate checks in the files backend and moving > refname tracking to the generic layer, which simplifies the codebase and > prepares it for the new feature. > > We then restructure the reftable backend's transaction preparation code, > extracting the update validation logic into a dedicated function. This > not only improves code organization but sets the stage for implementing > partial transaction support. > > With this groundwork in place, we implement the core partial transaction > support in the refs subsystem. This adds the necessary infrastructure to > track and report rejected updates while allowing transactions to proceed. > All reference backends are modified to support this behavior when enabled. > > Finally, we expose this functionality to users through > git-update-ref(1)'s `--allow-partial` flag, complete with test coverage > and documentation. The flag is specifically limited to `--stdin` mode > where batching multiple updates is most relevant. > > This enhancement improves Git's flexibility in handling reference > updates while maintaining the safety of atomic transactions by default. > It's particularly valuable for tools and workflows that need to handle > reference update failures gracefully without abandoning the entire batch > of updates. > > This series is based on top of bc204b7427 (The seventh batch, 2025-02-03). > There were no conflicts noticed with topics in 'seen' or 'next'. > > --- > Karthik Nayak (6): > refs/files: remove duplicate check in `split_symref_update()` > refs: move duplicate refname update check to generic layer > refs/files: remove duplicate duplicates check > refs/reftable: extract code from the transaction preparation > refs: implement partial reference transaction support > update-ref: add --allow-partial flag for stdin mode > > Documentation/git-update-ref.txt | 12 +- > builtin/update-ref.c | 53 ++++- > refs.c | 58 ++++- > refs.h | 22 ++ > refs/files-backend.c | 97 ++------ > refs/packed-backend.c | 49 ++-- > refs/refs-internal.h | 19 +- > refs/reftable-backend.c | 494 +++++++++++++++++++-------------------- > t/t1400-update-ref.sh | 191 +++++++++++++++ > 9 files changed, 633 insertions(+), 362 deletions(-) > --- > > > > --- > > base-commit: bc204b742735ae06f65bb20291c95985c9633b7f > change-id: 20241206-245-partially-atomic-ref-updates-9fe8b080345c > > Thanks > - Karthik > >
On 11/02/2025 17:03, Phillip Wood wrote: > On 07/02/2025 07:34, Karthik Nayak wrote: > >> This series introduces support for partial reference transactions, >> allowing individual reference updates to fail while letting others >> proceed. Thinking about this some more it is possible to skip the checking the current value of the ref so what is making the transaction fail? Is it D/F conflicts or something else? Best Wishes Phillip
Phillip Wood <phillip.wood123@gmail.com> writes: > Hi Karthik > > On 07/02/2025 07:34, Karthik Nayak wrote: >> Git's reference updates are traditionally atomic > > I'm nitpicking but the updates aren't actually atomic, if a transaction > updates two refs then it is possible for another process to see the one > ref pointing to the new value and the other pointing to the old value. > Good point. This is true in the case of the files backend, since updates involve locking individual files and during the commit phase, there is a possibility that one ref is updated while the other is yet to be (committing of the lock is not global but rather per ref file). However this is not the case with the reftable backend, there, updates are written to a new table and committed at the end after locking the table. So in the reftable backend, this is indeed atomic. >> - when updating >> multiple references in a transaction, either all updates succeed or none >> do. While this behavior is generally desirable, > > Isn't that the whole point of transactions? > Yup, this is the point of having a transaction indeed. >> it can be limiting in> certain scenarios, particularly with the reftable backend where batching >> multiple reference updates is more efficient than performing them >> sequentially. >> >> This series introduces support for partial reference transactions, >> allowing individual reference updates to fail while letting others >> proceed. > > This sounds like it's abusing ref transactions to implement a > performance optimization. I understand where you're coming from. This is definitely a stray from the regular atomic behavior, that transactions promise. But I would say this is more of an exception handling for the regular transaction mechanism and AFAIK this is also something that some of the databases support (see EXCEPTION in PostgreSQL). Overall, we're adding an exception handling support to the existing transaction interface. > I wonder if it would be better to provide that > via a different interface than shares the same underling implementation > as transactions. That would make it clear to someone reading the code > that individual ref updates can fail without affecting the rest. Burying > that detail in a flag makes it rather easy to miss. > Thinking this out, having a different interface sound good, but I feel we'd end up with the same structure as currently presented in this series. Only other way is to really split the implementation to support partial transactions as a entity of its own. In that case, we'd end up with code duplication. Do you think you can expand a little more here? > Best Wishes > > Phillip > Thanks for engaging! [snip]
Phillip Wood <phillip.wood123@gmail.com> writes: > On 11/02/2025 17:03, Phillip Wood wrote: >> On 07/02/2025 07:34, Karthik Nayak wrote: > > >>> This series introduces support for partial reference transactions, >>> allowing individual reference updates to fail while letting others >>> proceed. > > Thinking about this some more it is possible to skip the checking the > current value of the ref so what is making the transaction fail? Is it > D/F conflicts or something else? > It could be a multitude of issues, to name a some: - D/F conflicts - Unexpected old oid/target - dangling symrefs So this gives a hatch to partially commit parts of a transaction while also notifying the user about parts which failed. > Best Wishes > > Phillip
Hi Karthik On 12/02/2025 12:34, Karthik Nayak wrote: >> On 07/02/2025 07:34, Karthik Nayak wrote: >>> Git's reference updates are traditionally atomic >> >> I'm nitpicking but the updates aren't actually atomic, if a transaction >> updates two refs then it is possible for another process to see the one >> ref pointing to the new value and the other pointing to the old value. >> > > Good point. This is true in the case of the files backend, since updates > involve locking individual files and during the commit phase, there is a > possibility that one ref is updated while the other is yet to be > (committing of the lock is not global but rather per ref file). > > However this is not the case with the reftable backend, there, updates > are written to a new table and committed at the end after locking the > table. So in the reftable backend, this is indeed atomic. Ah, interesting. That explains why batching updates is so much more efficient when using the reftable backend. >>> This series introduces support for partial reference transactions, >>> allowing individual reference updates to fail while letting others >>> proceed. >> >> This sounds like it's abusing ref transactions to implement a >> performance optimization. > > I understand where you're coming from. This is definitely a stray from > the regular atomic behavior, that transactions promise. But I would say > this is more of an exception handling for the regular transaction > mechanism and AFAIK this is also something that some of the databases > support (see EXCEPTION in PostgreSQL). > > Overall, we're adding an exception handling support to the existing > transaction interface. My understanding of exception handling is that if an error occurs then an error handler is called (reading [1] that seems to be what PostgreSQL does as well). Is that what is being proposed here? I thought this series added a flag to ignore errors rather than provide a way to handle them. [1] https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING >> I wonder if it would be better to provide that >> via a different interface than shares the same underling implementation >> as transactions. That would make it clear to someone reading the code >> that individual ref updates can fail without affecting the rest. Burying >> that detail in a flag makes it rather easy to miss. >> > > Thinking this out, having a different interface sound good, but I feel > we'd end up with the same structure as currently presented in this > series. Only other way is to really split the implementation to support > partial transactions as a entity of its own. In that case, we'd end up > with code duplication. > > Do you think you can expand a little more here? I was thinking of a function that took a list of refs to update and made a best effort to update them, ignoring any updates that fail. My concern with adding a flag to ignore errors in the transaction api is that a partial transaction is a contradiction in terms. I'm also concerned that it seems to be ignoring all errors. I'd be happier if there was someway for the caller to specify which errors to ignore or if the caller could provide a callback to handle any errors. That way a caller could ignore d/f conflicts but still cause the transaction to fail if there was an i/o or could create a reference if it did not exist but leave it unchanged if it did exist. Best Wishes Phillip
On Wed, Feb 19, 2025 at 02:34:15PM +0000, Phillip Wood wrote: > On 12/02/2025 12:34, Karthik Nayak wrote: > > Thinking this out, having a different interface sound good, but I feel > > we'd end up with the same structure as currently presented in this > > series. Only other way is to really split the implementation to support > > partial transactions as a entity of its own. In that case, we'd end up > > with code duplication. > > > > Do you think you can expand a little more here? > > I was thinking of a function that took a list of refs to update and made a > best effort to update them, ignoring any updates that fail. > > My concern with adding a flag to ignore errors in the transaction api is > that a partial transaction is a contradiction in terms. I'm also concerned > that it seems to be ignoring all errors. I'd be happier if there was someway > for the caller to specify which errors to ignore or if the caller could > provide a callback to handle any errors. That way a caller could ignore d/f > conflicts but still cause the transaction to fail if there was an i/o or > could create a reference if it did not exist but leave it unchanged if it > did exist. This is a fair point I think, and it's also something that I called out in [1]. We shouldn't blanket-ignore all errors, but instead only ignore a well-known subset of errors and then record the exact failure reason. This allows for better and more unified error reporting, and would also allow us to easily build mechanisms where callers can specify that we are only expected to ignore a subset of those well-defined errors. But I also think that this is another selling point for continuing to build on top of the ref transaction. If we now want to record and ignore specific errors, only, then this falls out quite naturally from the current design of ref transactions. We already have all the ref updates in there, so we can then simply set an `enum ref_transaction_error` field for each of the failed updates. Eventually, I think we could also allow for modes where we declare only a subset of reference updates to be allowed to fail. I don't have a specific usecase for this, and don't think it needs to be implemented without one. But it's another thing that we could implement on top of transactions rather trivially. Patrick [1]: <Z6YxA4BlhNwbeYk-@pks.im>
Phillip Wood <phillip.wood123@gmail.com> writes: > Hi Karthik > > On 12/02/2025 12:34, Karthik Nayak wrote: >>> On 07/02/2025 07:34, Karthik Nayak wrote: >>>> Git's reference updates are traditionally atomic >>> >>> I'm nitpicking but the updates aren't actually atomic, if a transaction >>> updates two refs then it is possible for another process to see the one >>> ref pointing to the new value and the other pointing to the old value. >>> >> >> Good point. This is true in the case of the files backend, since updates >> involve locking individual files and during the commit phase, there is a >> possibility that one ref is updated while the other is yet to be >> (committing of the lock is not global but rather per ref file). >> >> However this is not the case with the reftable backend, there, updates >> are written to a new table and committed at the end after locking the >> table. So in the reftable backend, this is indeed atomic. > > Ah, interesting. That explains why batching updates is so much more > efficient when using the reftable backend. > >>>> This series introduces support for partial reference transactions, >>>> allowing individual reference updates to fail while letting others >>>> proceed. >>> >>> This sounds like it's abusing ref transactions to implement a >>> performance optimization. >> >> I understand where you're coming from. This is definitely a stray from >> the regular atomic behavior, that transactions promise. But I would say >> this is more of an exception handling for the regular transaction >> mechanism and AFAIK this is also something that some of the databases >> support (see EXCEPTION in PostgreSQL). >> >> Overall, we're adding an exception handling support to the existing >> transaction interface. > > My understanding of exception handling is that if an error occurs then > an error handler is called (reading [1] that seems to be what PostgreSQL > does as well). Is that what is being proposed here? I thought this > series added a flag to ignore errors rather than provide a way to handle > them. > That is correct, and while the current implementation is to ignore them. It does make way for building something like that in the future. > [1] > https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING > >>> I wonder if it would be better to provide that >>> via a different interface than shares the same underling implementation >>> as transactions. That would make it clear to someone reading the code >>> that individual ref updates can fail without affecting the rest. Burying >>> that detail in a flag makes it rather easy to miss. >>> >> >> Thinking this out, having a different interface sound good, but I feel >> we'd end up with the same structure as currently presented in this >> series. Only other way is to really split the implementation to support >> partial transactions as a entity of its own. In that case, we'd end up >> with code duplication. >> >> Do you think you can expand a little more here? > > I was thinking of a function that took a list of refs to update and made > a best effort to update them, ignoring any updates that fail. > Yes, this is what we want too, but in the context of transactions. Without transactions, this is already possible to build albeit on the user side with some simple scripts. > My concern with adding a flag to ignore errors in the transaction api is > that a partial transaction is a contradiction in terms. I'm also > concerned that it seems to be ignoring all errors. I'd be happier if > there was someway for the caller to specify which errors to ignore or if > the caller could provide a callback to handle any errors. That way a > caller could ignore d/f conflicts but still cause the transaction to > fail if there was an i/o or could create a reference if it did not exist > but leave it unchanged if it did exist. > Yeah, I think this is also something that Patrick raised and something I will tackle in the next version of this patch series. The next version will skip all user oriented errors (errors which can be fixed by changing the user input), while still catch system errors (I/O , memory ...). I also see how this can be extended to allow users to nit-pick which errors they don't care about. But that is something I don't plan to tackle for now. > Best Wishes > > Phillip Thanks for your inputs!