Message ID | 20201012201940.229694-1-smcallis@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | remote-curl: add testing for intelligent retry for HTTP | expand |
Sean McAllister <smcallis@google.com> writes: > +# generate a random 12 digit string > +gen_nonce() { > + test_copy_bytes 12 < /dev/urandom | tr -dc A-Za-z0-9 > +} What is the randomness requirement of this application? IOW, what breaks if we just change this to "echo 0123456789ab"? Or "date | git hash-object --stdin" for that matter? We'd want to make our tests more predictiable, not less. > diff --git a/t/lib-httpd/error-ntime.sh b/t/lib-httpd/error-ntime.sh > new file mode 100755 > index 0000000000..e4f91ab816 > --- /dev/null > +++ b/t/lib-httpd/error-ntime.sh > @@ -0,0 +1,79 @@ > +#!/bin/sh > + > +# Script to simulate a transient error code with Retry-After header set. > +# > +# PATH_INFO must be of the form /<nonce>/<times>/<retcode>/<retry-after>/<path> > +# (eg: /dc724af1/3/429/10/some/url) > +# > +# The <nonce> value uniquely identifies the URL, since we're simulating > +# a stateful operation using a stateless protocol, we need a way to "namespace" > +# URLs so that they don't step on each other. > +# > +# The first <times> times this endpoint is called, it will return the given > +# <retcode>, and if the <retry-after> is non-negative, it will set the > +# Retry-After head to that value. > +# > +# Subsequent calls will return a 302 redirect to <path>. > +# > +# Supported error codes are 429,502,503, and 504 > + > +print_status() { > + if [ "$1" -eq "302" ]; then printf "Status: 302 Found\n" > + elif [ "$1" -eq "429" ]; then printf "Status: 429 Too Many Requests\n" > + elif [ "$1" -eq "502" ]; then printf "Status: 502 Bad Gateway\n" > + elif [ "$1" -eq "503" ]; then printf "Status: 503 Service Unavailable\n" > + elif [ "$1" -eq "504" ]; then printf "Status: 504 Gateway Timeout\n" > + else > + printf "Status: 500 Internal Server Error\n" > + fi > + printf "Content-Type: text/plain\n" Style????? (I won't repeat this comment for the rest of this script) I briefly wondered "oh, are t/lib-httpd/* scripts excempt from the coding guidelines?" but a quick look at them tells me that that is not the case. > +} > + > +# read in path split into cmoponents > +IFS='/' > +tokens=($PATH_INFO) > + > +# break out code/retry parts of path > +nonce=${tokens[1]} > +times=${tokens[2]} > +code=${tokens[3]} > +retry=${tokens[4]} You said /bin/sh upfront. Don't use non-POSIX shell arrays. > + > +# get redirect path > +cnt=0 > +path="" > +for ((ii=0; ii < ${#PATH_INFO}; ii++)); do > + if [ "${PATH_INFO:${ii}:1}" == "/" ]; then > + let cnt=${cnt}+1 > + fi > + if [ "${cnt}" -eq 5 ]; then > + path="${PATH_INFO:${ii}}" > + break > + fi > +done > + > +# leave a cookie for this request/retry count > +state_file="request_${REMOTE_ADDR}_${nonce}_${times}_${code}_${retry}" > + > +if [ ! -f "$state_file" ]; then > + echo 0 > "$state_file" > +fi > + > + > +read cnt < "$state_file" > +if [ "$cnt" -lt "$times" ]; then > + let cnt=cnt+1 > + echo "$cnt" > "$state_file" > + > + # return error > + print_status "$code" > + if [ "$retry" -ge "0" ]; then > + printf "Retry-After: %s\n" "$retry" > + fi > +else > + # redirect > + print_status 302 > + printf "Location: %s?%s\n" "$path" "${QUERY_STRING}" > +fi > + > +echo > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > index 7df3c5373a..71d4307001 100755 > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh > @@ -756,6 +756,15 @@ test_expect_success 'partial clone using HTTP' ' > partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/smart/server" > ' > > +test_expect_success 'partial clone using HTTP with redirect' ' > + _NONCE=`gen_nonce` && export _NONCE && > + curl "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > /dev/null && > + curl "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > /dev/null && > + curl "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > /dev/null && These lines are not indented with HT? Don't redirect test output to /dev/null, which is done by test_expect_success for us. >/dev/null makes it less useful to run the test under "-v" option. > + partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > +' > + > + > # DO NOT add non-httpd-specific tests here, because the last part of this > # test script is only executed when httpd is available and enabled.
> Sean McAllister <smcallis@google.com> writes: > > > +# generate a random 12 digit string > > +gen_nonce() { > > + test_copy_bytes 12 < /dev/urandom | tr -dc A-Za-z0-9 > > +} > > What is the randomness requirement of this application? IOW, what > breaks if we just change this to "echo 0123456789ab"? > > Or "date | git hash-object --stdin" for that matter? > > We'd want to make our tests more predictiable, not less. The randomness requirement is just that I need nonces to be unique during a single run of the HTTP server as they uniquefy the files I put on disk to make the HTTP hack-ily stateful. I'd be fine with your date/hash-object solution, but I don't know that it will help make the tests more predictable. > > > diff --git a/t/lib-httpd/error-ntime.sh b/t/lib-httpd/error-ntime.sh > > new file mode 100755 > > index 0000000000..e4f91ab816 > > --- /dev/null > > +++ b/t/lib-httpd/error-ntime.sh > > @@ -0,0 +1,79 @@ > > +#!/bin/sh > > + > > +# Script to simulate a transient error code with Retry-After header set. > > +# > > +# PATH_INFO must be of the form /<nonce>/<times>/<retcode>/<retry-after>/<path> > > +# (eg: /dc724af1/3/429/10/some/url) > > +# > > +# The <nonce> value uniquely identifies the URL, since we're simulating > > +# a stateful operation using a stateless protocol, we need a way to "namespace" > > +# URLs so that they don't step on each other. > > +# > > +# The first <times> times this endpoint is called, it will return the given > > +# <retcode>, and if the <retry-after> is non-negative, it will set the > > +# Retry-After head to that value. > > +# > > +# Subsequent calls will return a 302 redirect to <path>. > > +# > > +# Supported error codes are 429,502,503, and 504 > > + > > +print_status() { > > + if [ "$1" -eq "302" ]; then printf "Status: 302 Found\n" > > + elif [ "$1" -eq "429" ]; then printf "Status: 429 Too Many Requests\n" > > + elif [ "$1" -eq "502" ]; then printf "Status: 502 Bad Gateway\n" > > + elif [ "$1" -eq "503" ]; then printf "Status: 503 Service Unavailable\n" > > + elif [ "$1" -eq "504" ]; then printf "Status: 504 Gateway Timeout\n" > > + else > > + printf "Status: 500 Internal Server Error\n" > > + fi > > + printf "Content-Type: text/plain\n" > > Style????? (I won't repeat this comment for the rest of this script) > > I briefly wondered "oh, are t/lib-httpd/* scripts excempt from the > coding guidelines?" but a quick look at them tells me that that is > not the case. > I mistakenly thought the Makefile in t/ was linting these as well. I've gone back through and fixed formatting issues and removed non-posix constructs. > > +} > > + > > +# read in path split into cmoponents > > +IFS='/' > > +tokens=($PATH_INFO) > > + > > +# break out code/retry parts of path > > +nonce=${tokens[1]} > > +times=${tokens[2]} > > +code=${tokens[3]} > > +retry=${tokens[4]} > > You said /bin/sh upfront. Don't use non-POSIX shell arrays. > > > + > > +# get redirect path > > +cnt=0 > > +path="" > > +for ((ii=0; ii < ${#PATH_INFO}; ii++)); do > > + if [ "${PATH_INFO:${ii}:1}" == "/" ]; then > > + let cnt=${cnt}+1 > > + fi > > + if [ "${cnt}" -eq 5 ]; then > > + path="${PATH_INFO:${ii}}" > > + break > > + fi > > +done > > + > > +# leave a cookie for this request/retry count > > +state_file="request_${REMOTE_ADDR}_${nonce}_${times}_${code}_${retry}" > > + > > +if [ ! -f "$state_file" ]; then > > + echo 0 > "$state_file" > > +fi > > + > > + > > +read cnt < "$state_file" > > +if [ "$cnt" -lt "$times" ]; then > > + let cnt=cnt+1 > > + echo "$cnt" > "$state_file" > > + > > + # return error > > + print_status "$code" > > + if [ "$retry" -ge "0" ]; then > > + printf "Retry-After: %s\n" "$retry" > > + fi > > +else > > + # redirect > > + print_status 302 > > + printf "Location: %s?%s\n" "$path" "${QUERY_STRING}" > > +fi > > + > > +echo > > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > > index 7df3c5373a..71d4307001 100755 > > --- a/t/t5601-clone.sh > > +++ b/t/t5601-clone.sh > > @@ -756,6 +756,15 @@ test_expect_success 'partial clone using HTTP' ' > > partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/smart/server" > > ' > > > > +test_expect_success 'partial clone using HTTP with redirect' ' > > + _NONCE=`gen_nonce` && export _NONCE && > > + curl "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > /dev/null && > > + curl "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > /dev/null && > > + curl "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > /dev/null && > > These lines are not indented with HT? > > Don't redirect test output to /dev/null, which is done by test_expect_success > for us. >/dev/null makes it less useful to run the test under "-v" option. > Fixed in v2. > > + partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > > +' > > + > > + > > # DO NOT add non-httpd-specific tests here, because the last part of this > > # test script is only executed when httpd is available and enabled.
Sean McAllister <smcallis@google.com> writes: >> Sean McAllister <smcallis@google.com> writes: >> >> > +# generate a random 12 digit string >> > +gen_nonce() { >> > + test_copy_bytes 12 < /dev/urandom | tr -dc A-Za-z0-9 >> > +} >> >> What is the randomness requirement of this application? IOW, what >> breaks if we just change this to "echo 0123456789ab"? >> >> Or "date | git hash-object --stdin" for that matter? >> >> We'd want to make our tests more predictiable, not less. > > The randomness requirement is just that I need nonces to be unique > during a single run of the HTTP server > as they uniquefy the files I put on disk to make the HTTP hack-ily > stateful. I'd be fine with your date/hash-object > solution, but I don't know that it will help make the tests more predictable. If so, would something like this be global_counter_for_nonce=0 gen_nonce () { global_counter_for_nonce=$(( global_counter_for_nonce + 1 )) && echo "$global_counter_for_nonce" } more appropriate? It is utterly predictable and yields the same answer only once during a single run.
Hi Junio, On Mon, 12 Oct 2020, Junio C Hamano wrote: > Sean McAllister <smcallis@google.com> writes: > > >> Sean McAllister <smcallis@google.com> writes: > >> > >> > +# generate a random 12 digit string > >> > +gen_nonce() { > >> > + test_copy_bytes 12 < /dev/urandom | tr -dc A-Za-z0-9 > >> > +} > >> > >> What is the randomness requirement of this application? IOW, what > >> breaks if we just change this to "echo 0123456789ab"? > >> > >> Or "date | git hash-object --stdin" for that matter? > >> > >> We'd want to make our tests more predictiable, not less. > > > > The randomness requirement is just that I need nonces to be unique > > during a single run of the HTTP server > > as they uniquefy the files I put on disk to make the HTTP hack-ily > > stateful. I'd be fine with your date/hash-object > > solution, but I don't know that it will help make the tests more predictable. > > If so, would something like this be > > global_counter_for_nonce=0 > gen_nonce () { > global_counter_for_nonce=$(( global_counter_for_nonce + 1 )) && > echo "$global_counter_for_nonce" > } > > more appropriate? It is utterly predictable and yields the same > answer only once during a single run. We should also consider using `test-tool genrandom <seed>` instead (where `<seed>` would have to be predictable, but probably would have to change between `gen_nonce()` calls). Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > We should also consider using `test-tool genrandom <seed>` instead (where > `<seed>` would have to be predictable, but probably would have to change > between `gen_nonce()` calls). Yup, that is exactly why I asked Sean about randomness requirement. It turns out that they care only about uniqueness, so the comparison is between keeping an ever-incrementing counter and (1) echoing its current contents and/or (2) feeding it to "test-tool genrandom" as the seed. The complexity of the code _we_ need to write anew is the same, but echo would probably be a win in both the number of forks and cycles departments. Thanks.
diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh index d2edfa4c50..da1d4adfb4 100644 --- a/t/lib-httpd.sh +++ b/t/lib-httpd.sh @@ -132,6 +132,7 @@ prepare_httpd() { install_script incomplete-length-upload-pack-v2-http.sh install_script incomplete-body-upload-pack-v2-http.sh install_script broken-smart-http.sh + install_script error-ntime.sh install_script error-smart-http.sh install_script error.sh install_script apply-one-time-perl.sh @@ -308,3 +309,8 @@ check_access_log() { test_cmp "$1" access.log.stripped fi } + +# generate a random 12 digit string +gen_nonce() { + test_copy_bytes 12 < /dev/urandom | tr -dc A-Za-z0-9 +} diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf index afa91e38b0..77c495e164 100644 --- a/t/lib-httpd/apache.conf +++ b/t/lib-httpd/apache.conf @@ -117,6 +117,12 @@ Alias /auth/dumb/ www/auth/dumb/ SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH} SetEnv GIT_HTTP_EXPORT_ALL </LocationMatch> + +# This may be suffixed with any path for redirection, so it should come before +# any of the other aliases, particularly the /smart_*[^/]*/(.*) alias as that can +# match a lot of URLs +ScriptAlias /error_ntime/ error-ntime.sh/ + ScriptAlias /smart/incomplete_length/git-upload-pack incomplete-length-upload-pack-v2-http.sh/ ScriptAlias /smart/incomplete_body/git-upload-pack incomplete-body-upload-pack-v2-http.sh/ ScriptAliasMatch /error_git_upload_pack/(.*)/git-upload-pack error.sh/ @@ -137,6 +143,9 @@ ScriptAliasMatch /one_time_perl/(.*) apply-one-time-perl.sh/$1 <Files broken-smart-http.sh> Options ExecCGI </Files> +<Files error-ntime.sh> + Options ExecCGI +</Files> <Files error-smart-http.sh> Options ExecCGI </Files> diff --git a/t/lib-httpd/error-ntime.sh b/t/lib-httpd/error-ntime.sh new file mode 100755 index 0000000000..e4f91ab816 --- /dev/null +++ b/t/lib-httpd/error-ntime.sh @@ -0,0 +1,79 @@ +#!/bin/sh + +# Script to simulate a transient error code with Retry-After header set. +# +# PATH_INFO must be of the form /<nonce>/<times>/<retcode>/<retry-after>/<path> +# (eg: /dc724af1/3/429/10/some/url) +# +# The <nonce> value uniquely identifies the URL, since we're simulating +# a stateful operation using a stateless protocol, we need a way to "namespace" +# URLs so that they don't step on each other. +# +# The first <times> times this endpoint is called, it will return the given +# <retcode>, and if the <retry-after> is non-negative, it will set the +# Retry-After head to that value. +# +# Subsequent calls will return a 302 redirect to <path>. +# +# Supported error codes are 429,502,503, and 504 + +print_status() { + if [ "$1" -eq "302" ]; then printf "Status: 302 Found\n" + elif [ "$1" -eq "429" ]; then printf "Status: 429 Too Many Requests\n" + elif [ "$1" -eq "502" ]; then printf "Status: 502 Bad Gateway\n" + elif [ "$1" -eq "503" ]; then printf "Status: 503 Service Unavailable\n" + elif [ "$1" -eq "504" ]; then printf "Status: 504 Gateway Timeout\n" + else + printf "Status: 500 Internal Server Error\n" + fi + printf "Content-Type: text/plain\n" +} + +# read in path split into cmoponents +IFS='/' +tokens=($PATH_INFO) + +# break out code/retry parts of path +nonce=${tokens[1]} +times=${tokens[2]} +code=${tokens[3]} +retry=${tokens[4]} + +# get redirect path +cnt=0 +path="" +for ((ii=0; ii < ${#PATH_INFO}; ii++)); do + if [ "${PATH_INFO:${ii}:1}" == "/" ]; then + let cnt=${cnt}+1 + fi + if [ "${cnt}" -eq 5 ]; then + path="${PATH_INFO:${ii}}" + break + fi +done + +# leave a cookie for this request/retry count +state_file="request_${REMOTE_ADDR}_${nonce}_${times}_${code}_${retry}" + +if [ ! -f "$state_file" ]; then + echo 0 > "$state_file" +fi + + +read cnt < "$state_file" +if [ "$cnt" -lt "$times" ]; then + let cnt=cnt+1 + echo "$cnt" > "$state_file" + + # return error + print_status "$code" + if [ "$retry" -ge "0" ]; then + printf "Retry-After: %s\n" "$retry" + fi +else + # redirect + print_status 302 + printf "Location: %s?%s\n" "$path" "${QUERY_STRING}" +fi + +echo diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 7df3c5373a..71d4307001 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -756,6 +756,15 @@ test_expect_success 'partial clone using HTTP' ' partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/smart/server" ' +test_expect_success 'partial clone using HTTP with redirect' ' + _NONCE=`gen_nonce` && export _NONCE && + curl "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > /dev/null && + curl "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > /dev/null && + curl "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" > /dev/null && + partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/error_ntime/${_NONCE}/3/502/10/smart/server" +' + + # DO NOT add non-httpd-specific tests here, because the last part of this # test script is only executed when httpd is available and enabled.
HTTP servers can sometimes throw errors, but that doesn't mean we should give up. If we have an error condition that we can reasonably retry on, then we should. This change is tricky because it requires a new CGI script to test as we need to be able to instruct the server to throw an error(s) before succeeding. We do this by encoding an error code and optional value for Retry-After into the URL, followed by the real endpoint: /error_ntime/dc724af1/<N>/429/10/smart/server This is a bit hacky, but really the best we can do since HTTP is fundamentally stateless. The URL is uniquefied by a nonce and we leave a breadcrumb on disk so all accesses after the first <N> redirect to the appropriate endpoint. Signed-off-by: Sean McAllister <smcallis@google.com> --- t/lib-httpd.sh | 6 +++ t/lib-httpd/apache.conf | 9 +++++ t/lib-httpd/error-ntime.sh | 79 ++++++++++++++++++++++++++++++++++++++ t/t5601-clone.sh | 9 +++++ 4 files changed, 103 insertions(+) create mode 100755 t/lib-httpd/error-ntime.sh