mbox series

[0/2] Optionally support push options on up-to-date branches

Message ID SA1PR14MB4691B63C740F95D7D8A495628D2B2@SA1PR14MB4691.namprd14.prod.outlook.com (mailing list archive)
Headers show
Series Optionally support push options on up-to-date branches | expand

Message

Christopher Lindee March 12, 2024, 9:55 p.m. UTC
Some Git servers can take actions on a branch using push options; to do this,
the branch ref must be passed along with the push options.  However, if the
branch is up-to-date, then Git submits push options without any refs, so the
server is not given a branch to act upon and the push options become no-ops.

This means a user who does the following

    $ git commit
    $ git push -o option remote branch  # Performs option

will have a very different experience from a user who does

    $ git commit
    $ git push remote branch
    $ git push -o option remote branch  # No-op

The latter could easily happen when a user forgets to add an appropriate option
on push, but could also happen intentionally when a user wants to back up their
changes to a remote and then later chooses to submit those changes with a push
option (e.g. one that creates a pull request) to the same remote.

To work around this issue, the user needs to re-write history between the two
push commands in some way, either through force pushing a temporary value to
the remote or by amending their commit to change the timestamp.  Either can
impact users who pull from the server, as well as automated systems triggered
by a post-receive hook.  With these potential side-effects, this workaround can
be a bad option in some setups.

This changeset proposes to address this issue by adding an option to `push` and
`send-pack` that, when specified, will send refs where the old-oid and new-oid
are identical - instead of silently skipping these refs.  The first commit
introduces the `--send-up-to-date` option to toggle this behavior, while the
second commit updates the commands to output an `(up-to-date)` notice for each
branch with an identical old-oid and new-oid.

Notably, the `--force` option will not send a ref when the remote is up-to-date.

Chris Lindee (2):
  Teach send-pack & push to --send-up-to-date refs
  Add transport message for up-to-date references

 Documentation/git-push.txt      | 8 +++++++-
 Documentation/git-send-pack.txt | 7 +++++++
 builtin/push.c                  | 1 +
 builtin/send-pack.c             | 4 ++++
 send-pack.c                     | 2 +-
 send-pack.h                     | 3 ++-
 transport-helper.c              | 7 ++++++-
 transport.c                     | 3 +++
 transport.h                     | 1 +
 9 files changed, 32 insertions(+), 4 deletions(-)
        

base-commit: 3c2a3fdc388747b9eaf4a4a4f2035c1c9ddb26d0

Comments

Junio C Hamano March 13, 2024, 10:58 p.m. UTC | #1
Christopher Lindee <christopher.lindee@webpros.com> writes:

> Some Git servers can take actions on a branch using push options; to do this,
> the branch ref must be passed along with the push options.  However, if the
> branch is up-to-date, then Git submits push options without any refs, so the
> server is not given a branch to act upon and the push options become no-ops.

Yeah, the send-pack/receive-pack were written in the simpler days
back when "pushing the same object again to the ref was an absolute
no-op", and push options that breaks the expectation did not exist.

It makes sense to allow a (seemingly no-op) push to go through with
an option.

And even without custom push option, recording this seemingly no-op
push as a ref "update" does make sense--push certificate records
what object the pusher wanted each target ref to have, and omitting
a ref from the record only because the ref was already pointing at
the desired object is losing information.

So I doubly agree with the reasoning beind this change.

> This changeset proposes to address this issue by adding an option to `push` and
> `send-pack` that, when specified, will send refs where the old-oid and new-oid

"where" -> "even if"

> are identical - instead of silently skipping these refs.  The first commit
> introduces the `--send-up-to-date` option to toggle this behavior, while the
> second commit updates the commands to output an `(up-to-date)` notice for each
> branch with an identical old-oid and new-oid.
>
> Notably, the `--force` option will not send a ref when the remote is up-to-date.

And it makes sense *not* to update `--force` to do the no-op push,
becaues you may not want to (accidentally) force push a ref that
does not fast-forward.  As I already said, tying this with use of
the "-o" option is not sufficient.  So I agree we may want a new
option to trigger this behaviour.

A radical counter-proposal for the design is to update the client
side to do this unconditionally, without needing any new option.
For an already up-to-date ref, its only contribution to the cost of
"git push" is from its "Finally, tell the other end!" instruction,
which is in the order of 100 bytes per ref, and it should not add to
the pack generation cost at all [*].

    Side note: But we have to be careful---if the receiving end is
    totally up-to-date and there is absolutely no ref, I think the
    current code will not even call pack_objects(), and you'd
    probably want a similar optimization to avoid the cost to spawn
    pack_objects().

I do not know if "send-up-to-date" is a great name for the option,
though.

>
> Chris Lindee (2):
>   Teach send-pack & push to --send-up-to-date refs
>   Add transport message for up-to-date references
>
>  Documentation/git-push.txt      | 8 +++++++-
>  Documentation/git-send-pack.txt | 7 +++++++
>  builtin/push.c                  | 1 +
>  builtin/send-pack.c             | 4 ++++
>  send-pack.c                     | 2 +-
>  send-pack.h                     | 3 ++-
>  transport-helper.c              | 7 ++++++-
>  transport.c                     | 3 +++
>  transport.h                     | 1 +
>  9 files changed, 32 insertions(+), 4 deletions(-)
>         
>
> base-commit: 3c2a3fdc388747b9eaf4a4a4f2035c1c9ddb26d0
brian m. carlson March 14, 2024, 10:55 p.m. UTC | #2
On 2024-03-13 at 22:58:58, Junio C Hamano wrote:
> Christopher Lindee <christopher.lindee@webpros.com> writes:
> > This changeset proposes to address this issue by adding an option to `push` and
> > `send-pack` that, when specified, will send refs where the old-oid and new-oid
> 
> "where" -> "even if"
> 
> > are identical - instead of silently skipping these refs.  The first commit
> > introduces the `--send-up-to-date` option to toggle this behavior, while the
> > second commit updates the commands to output an `(up-to-date)` notice for each
> > branch with an identical old-oid and new-oid.
> >
> > Notably, the `--force` option will not send a ref when the remote is up-to-date.
> 
> And it makes sense *not* to update `--force` to do the no-op push,
> becaues you may not want to (accidentally) force push a ref that
> does not fast-forward.  As I already said, tying this with use of
> the "-o" option is not sufficient.  So I agree we may want a new
> option to trigger this behaviour.
> 
> A radical counter-proposal for the design is to update the client
> side to do this unconditionally, without needing any new option.

I'm not sure that would be a great idea.  Since it's a push, that will
trigger authentication, which may prompt the user (e.g., for a password
or token or for a YubiKey touch with FIDO2 SSH) and which they might be
able to easily avoid.  As a server operator, I also expect that there
are people doing lots of needless attempts at pushing in automated
systems (because with enough users, there will be at least some who do
bizarre or inefficient things), and I would prefer to avoid serving
those requests if I don't need to.  (For example, for us, reference
updates need to go through a distributed commit protocol to update
multiple replicas of the repository, and if there's no ref updates, then
we cut out multiple services which we don't need to contact.)

Note also that no-op ref updates cannot be simply omitted on the server
side because we need to verify that the old value for the ref is
correct, or we need to reject the operation as out of date.  While it is
_unlikely_ that the ref has changed since we fetched it from the server,
it's not guaranteed, especially if there's an expensive `pre-push` hook.

I do think the proposed change is valuable, though, and I'm in favour of
it with a suitable option.
Christopher Lindee March 14, 2024, 11:08 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> Christopher Lindee <christopher.lindee@webpros.com> writes:
> 
> > Some Git servers can take actions on a branch using push options; to do this,
> > the branch ref must be passed along with the push options.  However, if the
> > branch is up-to-date, then Git submits push options without any refs, so the
> > server is not given a branch to act upon and the push options become no-ops.
> 
> Yeah, the send-pack/receive-pack were written in the simpler days
> back when "pushing the same object again to the ref was an absolute
> no-op", and push options that breaks the expectation did not exist.
> 
> It makes sense to allow a (seemingly no-op) push to go through with
> an option.
> 
> And even without custom push option, recording this seemingly no-op
> push as a ref "update" does make sense--push certificate records
> what object the pusher wanted each target ref to have, and omitting
> a ref from the record only because the ref was already pointing at
> the desired object is losing information.

Hmm... does this mean that push certificate records should imply this new
option?

> So I doubly agree with the reasoning beind this change.
> 
> > This changeset proposes to address this issue by adding an option to `push` and
> > `send-pack` that, when specified, will send refs where the old-oid and new-oid
> 
> "where" -> "even if"

Excellent clarification!

> > are identical - instead of silently skipping these refs.  The first commit
> > introduces the `--send-up-to-date` option to toggle this behavior, while the
> > second commit updates the commands to output an `(up-to-date)` notice for each
> > branch with an identical old-oid and new-oid.
> >
> > Notably, the `--force` option will not send a ref when the remote is up-to-date.
> 
> And it makes sense *not* to update `--force` to do the no-op push,
> becaues you may not want to (accidentally) force push a ref that
> does not fast-forward.  As I already said, tying this with use of
> the "-o" option is not sufficient.  So I agree we may want a new
> option to trigger this behaviour.
> 
> A radical counter-proposal for the design is to update the client
> side to do this unconditionally, without needing any new option.
> For an already up-to-date ref, its only contribution to the cost of
> "git push" is from its "Finally, tell the other end!" instruction,
> which is in the order of 100 bytes per ref, and it should not add to
> the pack generation cost at all [*].

About 100 bytes per ref can grow when using a ref glob; for example,
`git push <remote> refs/tags/*:refs/tags/*` would push up ~1200 refs
to the Git repository, or about 120 KiB across ~80 Ethernet frames.
That's not much compared to the amount of data fetched during a push,
but it seems like a lot of unnecessary data in most instances and it
would add a tiny bit of latency (more on that in a second).

>     Side note: But we have to be careful---if the receiving end is
>     totally up-to-date and there is absolutely no ref, I think the
>     current code will not even call pack_objects(), and you'd
>     probably want a similar optimization to avoid the cost to spawn
>     pack_objects().

Good point.  This got me thinking about other potential side-effects.

What would happen if two actors push to the repository at the same
time with a lot of no-op branch changes?  For example,

Actors 1 & 2 (either actor could be first):
push< aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa refs/heads/main\0report-status ...
push< bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb refs/heads/branch
push< ffffffffffffffffffffffffffffffffffffffff .have
push< ...
push< 0000 

Actor 1:
push> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 5555555555555555555555555555555555555555 refs/heads/main\0 report-status-v2 ...
push> bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb refs/heads/branch
push> 0000

Actor 2:
push> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa refs/heads/main\0 report-status-v2 ...
push> bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb 4444444444444444444444444444444444444444 refs/heads/branch
push> 0000

If repository locking makes `push<` and `push>` a combined atomic operation,
then this could never happen.  Otherwise, I suspect Actor 2 would receive an
error (or at least a warning) since refs/heads/main is now 555... instead of
aaa....  Moreover, if an error and `--atomic` is specified, then everything
fails to update.

And, as mentioned before, with a large number of refs there will be
additional latency, which increases the chance of collisions.

> I do not know if "send-up-to-date" is a great name for the option,
> though.

Agreed; names are difficult.

I considered --always-send-refs, but thought it a bit vague since it
implies refs are sometimes not sent, but requires you to look at the
documentation to understand under what conditions refs would not be
sent.  I also feared there might be other conditions where refs are
not sent that I was unaware of.

I think something with "noop" makes sense; thing is, which do we use?
1) --force-nop
2) --force-noop
3) --force-no-op

I've seen all 3 ways of writing "no operation".  Everyone has their
preference (e.g. I did 1; Junio suggested 3 and later used 2) and if
the one we choose doesn't match up to a user's, typos seem inevitable.
Though I suppose we already have this issue with color/colour.

I also want to consider the audience.  To me, "no-op" feels low-level.
Meanwhile, "send-up-to-date" uses the same nomenclature as the UI
displays and feels more porcelain.  Frankly, this options seems more
low-level to me, since users probably want to consider the possible
side-effects on server-side before using it.  However, I could see
GitLab telling average users to use the option on their Push Options
(https://docs.gitlab.com/ee/user/project/push_options.html) docs.
I'm curious what others think.

> >
> > Chris Lindee (2):
> >   Teach send-pack & push to --send-up-to-date refs
> >   Add transport message for up-to-date references
> >
> >  Documentation/git-push.txt      | 8 +++++++-
> >  Documentation/git-send-pack.txt | 7 +++++++
> >  builtin/push.c                  | 1 +
> >  builtin/send-pack.c             | 4 ++++
> >  send-pack.c                     | 2 +-
> >  send-pack.h                     | 3 ++-
> >  transport-helper.c              | 7 ++++++-
> >  transport.c                     | 3 +++
> >  transport.h                     | 1 +
> >  9 files changed, 32 insertions(+), 4 deletions(-)
> >
> >
> > base-commit: 3c2a3fdc388747b9eaf4a4a4f2035c1c9ddb26d0
Junio C Hamano March 14, 2024, 11:29 p.m. UTC | #4
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> A radical counter-proposal for the design is to update the client
>> side to do this unconditionally, without needing any new option.
>
> I'm not sure that would be a great idea.

Thanks.  I was looking for a push-back ;-)

> Since it's a push, that will
> trigger authentication, which may prompt the user (e.g., for a password
> or token or for a YubiKey touch with FIDO2 SSH) and which they might be
> able to easily avoid.

Do you mean we first go unauthenticated to find out what commits are
at the tip of refs at the destination repository, and then switch to
authenticated push after we find out there is something that is
worth pushing?  I somehow thought we need an authenticated access
only for the initial ls-refs exchange.

> As a server operator, I also expect that there
> are people doing lots of needless attempts at pushing in automated
> systems (because with enough users, there will be at least some who do
> bizarre or inefficient things), and I would prefer to avoid serving
> those requests if I don't need to.  (For example, for us, reference
> updates need to go through a distributed commit protocol to update
> multiple replicas of the repository, and if there's no ref updates, then
> we cut out multiple services which we don't need to contact.)

Yes, but if you have an extra no-op ref update in the bunch, that
one is excluded from the set of changes to be synchronised across
replicas, no?

> Note also that no-op ref updates cannot be simply omitted on the server
> side because we need to verify that the old value for the ref is
> correct, or we need to reject the operation as out of date.

Yes, but isn't that something the user would rather want to find out
earlier rather than later?  Your push without no-op update may say
"ah, we are up to date" when we are behind or worse diverged.  If we
do the no-op update more often, we'd have more chance to catch such
a situation where the worldview the end-user's repository has is out
of sync with reality.

> I do think the proposed change is valuable, though, and I'm in favour of
> it with a suitable option.

In any case, I am OK with the feature.  I just was wondering if the
end-user experience may become simpler and easier if we did not have
to have a command line option.

Thanks.
Chris Torek March 15, 2024, 12:04 a.m. UTC | #5
On Thu, Mar 14, 2024 at 4:37 PM Junio C Hamano <gitster@pobox.com> wrote:
> In any case, I am OK with the feature.  I just was wondering if the
> end-user experience may become simpler and easier if we did not have
> to have a command line option.

Would it make sense for the *server* to request the option?

That is, you run:

    git push server-remote ref

and if the server at server-remote "wants" to know about "no-op
pushes" because it will do something for this case, it says so.
If the server says nothing, the client doesn't bother with the no-op
pushes.

(This could be either per-ref or global, too.)

Chris
Christopher Lindee March 15, 2024, 12:11 a.m. UTC | #6
brian m. carlson writes:

> On 2024-03-13 at 22:58:58, Junio C Hamano wrote:
> > Christopher Lindee <christopher.lindee@webpros.com> writes:
> > > This changeset proposes to address this issue by adding an option to `push` and
> > > `send-pack` that, when specified, will send refs where the old-oid and new-oid
> > 
> > "where" -> "even if"
> > 
> > > are identical - instead of silently skipping these refs.  The first commit
> > > introduces the `--send-up-to-date` option to toggle this behavior, while the
> > > second commit updates the commands to output an `(up-to-date)` notice for each
> > > branch with an identical old-oid and new-oid.
> > > 
> > > Notably, the `--force` option will not send a ref when the remote is up-to-date.
> > 
> > And it makes sense *not* to update `--force` to do the no-op push,
> > becaues you may not want to (accidentally) force push a ref that
> > does not fast-forward.  As I already said, tying this with use of
> > the "-o" option is not sufficient.  So I agree we may want a new
> > option to trigger this behaviour.
> > 
> > A radical counter-proposal for the design is to update the client
> > side to do this unconditionally, without needing any new option.
> 
> I'm not sure that would be a great idea.  Since it's a push, that will
> trigger authentication, which may prompt the user (e.g., for a password
> or token or for a YubiKey touch with FIDO2 SSH) and which they might be
> able to easily avoid.  As a server operator, I also expect that there
> are people doing lots of needless attempts at pushing in automated
> systems (because with enough users, there will be at least some who do
> bizarre or inefficient things), and I would prefer to avoid serving
> those requests if I don't need to.  (For example, for us, reference
> updates need to go through a distributed commit protocol to update
> multiple replicas of the repository, and if there's no ref updates, then
> we cut out multiple services which we don't need to contact.)

I agree.  I could easily see a nightly cron that backs up all refs to
a server that would then trigger server-side action.

I'm curious about authentication though, when I did a packet trace on
an up-to-date ref, I noticed it ended with:

push< ...
push< 0000
push> 0000

This suggests that something is sent back to the server.  Does that 
null entry not require authentication?

> Note also that no-op ref updates cannot be simply omitted on the server
> side because we need to verify that the old value for the ref is
> correct, or we need to reject the operation as out of date.  While it is
> _unlikely_ that the ref has changed since we fetched it from the server,
> it's not guaranteed, especially if there's an expensive `pre-push` hook.

Regards,
Chris.
Christopher Lindee March 15, 2024, 12:19 a.m. UTC | #7
Junio C Hamano writes:

> In any case, I am OK with the feature.  I just was wondering if the
> end-user experience may become simpler and easier if we did not have
> to have a command line option.

I think we should have the command line option, if only for backwards 
compatibility.  The question on my mind is: do we default to on or off?

Regards,
Chris.
brian m. carlson March 15, 2024, 12:39 a.m. UTC | #8
On 2024-03-14 at 23:29:40, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> >> A radical counter-proposal for the design is to update the client
> >> side to do this unconditionally, without needing any new option.
> >
> > I'm not sure that would be a great idea.
> 
> Thanks.  I was looking for a push-back ;-)
> 
> > Since it's a push, that will
> > trigger authentication, which may prompt the user (e.g., for a password
> > or token or for a YubiKey touch with FIDO2 SSH) and which they might be
> > able to easily avoid.
> 
> Do you mean we first go unauthenticated to find out what commits are
> at the tip of refs at the destination repository, and then switch to
> authenticated push after we find out there is something that is
> worth pushing?  I somehow thought we need an authenticated access
> only for the initial ls-refs exchange.

It depends on the server and the protocol.  Protocol v0 allows
unauthenticated ref exchange, and then the push itself can be
authenticated.  It is customary to require authentication on both
pieces, but not mandatory.

> > As a server operator, I also expect that there
> > are people doing lots of needless attempts at pushing in automated
> > systems (because with enough users, there will be at least some who do
> > bizarre or inefficient things), and I would prefer to avoid serving
> > those requests if I don't need to.  (For example, for us, reference
> > updates need to go through a distributed commit protocol to update
> > multiple replicas of the repository, and if there's no ref updates, then
> > we cut out multiple services which we don't need to contact.)
> 
> Yes, but if you have an extra no-op ref update in the bunch, that
> one is excluded from the set of changes to be synchronised across
> replicas, no?

As I said, we can't do that, because we have to verify that the old
value is correct.  Even if we did perform that optimization, once we
know we have a write, we have to contact every replica (in case the user
does indeed send us a pack, which is allowed by the protocol, even if
bizarre), whereas with a ref request or a read, we can satisfy that with
a single read replica.  I'm sure I can satisfy at least one to two
orders of magnitude more no-op push attempts with the current
optimization than I could if we didn't have it.  It might even be to the
point of three or more orders of magnitude.  (My guess is that reftable
will improve this even further, but I haven't yet measured.)

I'm sure this is also true for most other implementations; serving
reads, like ls-refs, are almost always substantially cheaper than writes
and most server implementations are highly optimized for serving large
numbers of reads due to CI farms and such.

> > Note also that no-op ref updates cannot be simply omitted on the server
> > side because we need to verify that the old value for the ref is
> > correct, or we need to reject the operation as out of date.
> 
> Yes, but isn't that something the user would rather want to find out
> earlier rather than later?  Your push without no-op update may say
> "ah, we are up to date" when we are behind or worse diverged.  If we
> do the no-op update more often, we'd have more chance to catch such
> a situation where the worldview the end-user's repository has is out
> of sync with reality.

Well, we were up to date at the time the ls-refs operation completed, or
the push wouldn't have the same old and new OIDs.  The client has to
declare the old ref value that it got from the ls-refs output in order
for the operation to succeed, since the server-side Git won't update the
references if the old value doesn't match.  (This is how the client
knows whether a push is a fast-forward or not: it computes it, and if
the remote side has an unknown value, then it's not.)  Other than
updating the push options, there's no functional benefit to making a
second request, since the ls-refs output already told us where the
branch is.  That's why the current optimization is functionally correct.

In any event, there are alternate implementations of the Git protocol
which did not implement our current optimization and did, for purposes
of mirroring, make excessive numbers of no-op requests, and we did have
to ask for that to be fixed.  I assure you, the no-op behaviour is
better for users because it's faster and substantially more likely to
succeed (because it is effectively a read-only ls-refs request), whereas
a new ref update is less efficient, slower, and substantially more
likely to fail (due to functional limitations on write throughput).

I also expect there will be poorly written `pre-push` and `pre-receive`
hooks which will not like getting a no-op update and not handle it
gracefully.  I'm pretty sure Git LFS will handle this just fine, but
lots of people have various shell scripts that probably will not.
brian m. carlson March 15, 2024, 12:57 a.m. UTC | #9
On 2024-03-15 at 00:04:25, Chris Torek wrote:
> On Thu, Mar 14, 2024 at 4:37 PM Junio C Hamano <gitster@pobox.com> wrote:
> > In any case, I am OK with the feature.  I just was wondering if the
> > end-user experience may become simpler and easier if we did not have
> > to have a command line option.
> 
> Would it make sense for the *server* to request the option?
> 
> That is, you run:
> 
>     git push server-remote ref
> 
> and if the server at server-remote "wants" to know about "no-op
> pushes" because it will do something for this case, it says so.
> If the server says nothing, the client doesn't bother with the no-op
> pushes.
> 
> (This could be either per-ref or global, too.)

This is an interesting idea.  I don't think this can be per-ref because
at the capabilities stage, we don't actually know what the user wants to
send, if anything.  v0 capabilities are too limited in size to declare
per-ref options, although I suppose v2 could, in theory, support that.

I do worry, though, that even if the server may be interested in
additional push options or push certificates, as Chris's series
indicates, it may not generally care about no-op ref updates, and thus,
it may not be granular enough to avoid the kinds of performance problems
that I've mentioned elsewhere in the thread.

For example, I know that GitLab uses push options (and GitHub has
support for them as well in some cases) and thus the proposed feature
would be useful there, but I suspect we'd still want regular no-op
pushes (without push options or push certificates) to not be sent for
performance reasons.
Christopher Lindee March 15, 2024, 8:58 a.m. UTC | #10
brian m. carlson <sandals@crustytoothpaste.net> writes:

> On 2024-03-14 at 23:29:40, Junio C Hamano wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > > As a server operator, I also expect that there
> > > are people doing lots of needless attempts at pushing in automated
> > > systems (because with enough users, there will be at least some who do
> > > bizarre or inefficient things), and I would prefer to avoid serving
> > > those requests if I don't need to.  (For example, for us, reference
> > > updates need to go through a distributed commit protocol to update
> > > multiple replicas of the repository, and if there's no ref updates, then
> > > we cut out multiple services which we don't need to contact.)
> >
> > Yes, but if you have an extra no-op ref update in the bunch, that
> > one is excluded from the set of changes to be synchronised across
> > replicas, no?
> 
> 
> As I said, we can't do that, because we have to verify that the old
> value is correct.  Even if we did perform that optimization, once we
> know we have a write, we have to contact every replica (in case the user
> does indeed send us a pack, which is allowed by the protocol, even if
> bizarre), whereas with a ref request or a read, we can satisfy that with
> a single read replica.  I'm sure I can satisfy at least one to two
> orders of magnitude more no-op push attempts with the current
> optimization than I could if we didn't have it.  It might even be to the
> point of three or more orders of magnitude.  (My guess is that reftable
> will improve this even further, but I haven't yet measured.)
> 
> 
> I'm sure this is also true for most other implementations; serving
> reads, like ls-refs, are almost always substantially cheaper than writes
> and most server implementations are highly optimized for serving large
> numbers of reads due to CI farms and such.
> 
> 
> > > Note also that no-op ref updates cannot be simply omitted on the server
> > > side because we need to verify that the old value for the ref is
> > > correct, or we need to reject the operation as out of date.
> >
> > Yes, but isn't that something the user would rather want to find out
> > earlier rather than later?  Your push without no-op update may say
> > "ah, we are up to date" when we are behind or worse diverged.  If we
> > do the no-op update more often, we'd have more chance to catch such
> > a situation where the worldview the end-user's repository has is out
> > of sync with reality.
> 
> 
> Well, we were up to date at the time the ls-refs operation completed, or
> the push wouldn't have the same old and new OIDs.  The client has to
> declare the old ref value that it got from the ls-refs output in order
> for the operation to succeed, since the server-side Git won't update the
> references if the old value doesn't match.  (This is how the client
> knows whether a push is a fast-forward or not: it computes it, and if
> the remote side has an unknown value, then it's not.)  Other than
> updating the push options, there's no functional benefit to making a
> second request, since the ls-refs output already told us where the
> branch is.  That's why the current optimization is functionally correct.
> 
> 
> In any event, there are alternate implementations of the Git protocol
> which did not implement our current optimization and did, for purposes
> of mirroring, make excessive numbers of no-op requests, and we did have
> to ask for that to be fixed.  I assure you, the no-op behaviour is
> better for users because it's faster and substantially more likely to
> succeed (because it is effectively a read-only ls-refs request), whereas
> a new ref update is less efficient, slower, and substantially more
> likely to fail (due to functional limitations on write throughput).
> 
> 
> I also expect there will be poorly written `pre-push` and `pre-receive`
> hooks which will not like getting a no-op update and not handle it
> gracefully.  I'm pretty sure Git LFS will handle this just fine, but
> lots of people have various shell scripts that probably will not.

Is this a potential avenue for DoS?

Clearly, I'm not the first to send no-op ref updates, so this change
doesn't create the problem, but it could allow non-coders to exploit
existing problems much more easily.

If this could be a problem, we can take the time now to solve it.  A
simple limit on the number of no-op refs received before the server
terminates with REF_STATUS_REMOTE_REJECT should be sufficient, right?
If we bake that expectation into the official Git implementation, then
other implementations will be empowered to take similar steps.

For at least one service[1], GitLab only supports Push Options on, at
most, 10 refs.  If other server operators have similar limitations,
then this is an easy bud to nip.

Regards,
Chris.

[1] https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/merge_requests/push_options_handler_service.rb?ref_type=heads#L5
Junio C Hamano March 15, 2024, 3:42 p.m. UTC | #11
Christopher Lindee <christopher.lindee@webpros.com> writes:

> Junio C Hamano writes:
>
>> In any case, I am OK with the feature.  I just was wondering if the
>> end-user experience may become simpler and easier if we did not have
>> to have a command line option.
>
> I think we should have the command line option, if only for backwards 
> compatibility.  The question on my mind is: do we default to on or off?

It seems like the concensus is we want this as an opt-in feature.
Defaults off, with an explicit command line option.

Thanks.
brian m. carlson March 15, 2024, 8:29 p.m. UTC | #12
On 2024-03-15 at 08:58:35, Christopher Lindee wrote:
> Is this a potential avenue for DoS?

No, it's not.  In our implementation, there is a functional limit on ref
updates and if you exceed it, the operation fails.  We also have rate
limiting that estimates future cost based on the cost of previous
requests and delays or fails requests which are projected to exceed a
reasonable threshold.  (Thus, you can make many more cheap requests, or
fewer expensive ones, your choice.)  All of this is per-repository, so
generally only you (and maybe your colleagues or collaborators)
experience negative consequences if you attempt excessive use.

I can't speak to other implementations, but robust rate limits are
typically common.  I'm sure all major implementations open to the public
have some sort of rate limiting because otherwise they'd be down a lot.

The difference is that failing operations and even well-explained,
well-documented rate limits cause a poor user experience, user
frustration, and user inquiries (e.g., support requests), as well as
possibly noisy alerting and potentially pages for engineers.  Anytime a
user experiences rate limits, they have to make a change in their
behaviour, and people don't like change.

I'd prefer we left the default to do the cheap no-op because, as I said,
that scales much better, and thus we allow users to do the obvious thing
for much longer and it just works for them.  That, in turn, provides
everyone a better experience.

Certainly, if people start using this option by default, then it will be
a problem if they engage in excessive use and server implementations
will probably scale worse, but usually users don't use non-default
options unless they need them, so I don't think your new proposed option
is a problem.