diff mbox series

Forbidden requests for kernel.org/releases.json

Message ID q3gayrsulu424e2qr5eg7zfs2rgy5ucluuw73o2pjcxmehvvmp@qxy723fyda3x (mailing list archive)
State New
Headers show
Series Forbidden requests for kernel.org/releases.json | expand

Commit Message

Daniel Gomez April 10, 2025, 8:05 a.m. UTC
We've started encountering "HTTP Error 403: Forbidden" errors in kdevops
when querying https://www.kernel.org/releases.json from our CI environments/
deployments. We're using a Python script with the urllib library to fetch the
latest kernel release information [1].

As a temporary workaround [2], we are testing a User-Agent header to mimic a
browser request. To solve this properly, we have the following questions:

* What is the recommended approach for automated tools to access
kernel.org/releases.json?

* Are there any rate limits we should be aware of?

* Would it be possible to serve releases.json from a CDN-backed subdomain to
reduce load on the main site? We could mirror the subdomain directory and
enable pointers for our datacenter network.

[1] https://github.com/linux-kdevops/kdevops/blob/main/scripts/generate_refs.py#L300
[2] add User-Agent headers to query kernel.org/releases.json:

Comments

Borislav Petkov April 10, 2025, 10:30 a.m. UTC | #1
On Thu, Apr 10, 2025 at 10:05:28AM +0200, Daniel Gomez wrote:
> We've started encountering "HTTP Error 403: Forbidden" errors in kdevops
> when querying https://www.kernel.org/releases.json from our CI environments/
> deployments. We're using a Python script with the urllib library to fetch the
> latest kernel release information [1].
> 
> As a temporary workaround [2], we are testing a User-Agent header to mimic a
> browser request. To solve this properly, we have the following questions:
> 
> * What is the recommended approach for automated tools to access
> kernel.org/releases.json?
> 
> * Are there any rate limits we should be aware of?
> 
> * Would it be possible to serve releases.json from a CDN-backed subdomain to
> reduce load on the main site? We could mirror the subdomain directory and
> enable pointers for our datacenter network.

+1

I have a script which tests whether a lore link: URL I'm adding to patches, is
correct. I.e. whether

https://lore.kernel.org/r/<Message-ID>

can be read.

What would be the suggested thing to do in such cases?

Thx.
Daniel Gomez April 10, 2025, 12:09 p.m. UTC | #2
On Thu, Apr 10, 2025 at 12:30:37PM +0100, Borislav Petkov wrote:
> On Thu, Apr 10, 2025 at 10:05:28AM +0200, Daniel Gomez wrote:
> > We've started encountering "HTTP Error 403: Forbidden" errors in kdevops
> > when querying https://www.kernel.org/releases.json from our CI environments/
> > deployments. We're using a Python script with the urllib library to fetch the
> > latest kernel release information [1].
> > 
> > As a temporary workaround [2], we are testing a User-Agent header to mimic a
> > browser request. To solve this properly, we have the following questions:
> > 
> > * What is the recommended approach for automated tools to access
> > kernel.org/releases.json?
> > 
> > * Are there any rate limits we should be aware of?
> > 
> > * Would it be possible to serve releases.json from a CDN-backed subdomain to
> > reduce load on the main site? We could mirror the subdomain directory and
> > enable pointers for our datacenter network.
> 
> +1
> 
> I have a script which tests whether a lore link: URL I'm adding to patches, is
> correct. I.e. whether
> 
> https://lore.kernel.org/r/<Message-ID>
> 
> can be read.
> 
> What would be the suggested thing to do in such cases?

FYI, I read this thread [1] recently where b4 was also failing on that type of
URL. Quoting the explanation:

"The anubis bot protection that I put in place yesterday required remapping some
of the mountpoints, such as the legacy /r/. Internally, b4 has been using /all/
instead of /r/ for a while, but people who had b4.midmask set to a URL with /r/ 	
in it experienced problems. Fixing the /r/ mount fixed the problem."

"what is the canonical URL we should use for Link: tags? 
https://lore.kernel.org/r or /all?"

"You can continue to use /r/ in URLs, or just omit /r/ entirely. Don't use
/all/."

[1]
https://fosstodon.org/@brauner@mastodon.social/114281631562783020

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Konstantin Ryabitsev April 10, 2025, 12:45 p.m. UTC | #3
On Thu, Apr 10, 2025 at 10:05:28AM +0200, Daniel Gomez wrote:
> We've started encountering "HTTP Error 403: Forbidden" errors in kdevops
> when querying https://www.kernel.org/releases.json from our CI environments/
> deployments. We're using a Python script with the urllib library to fetch the
> latest kernel release information [1].

Yes, I'm trying to deal with bots who don't identify themselves. We're seeing
many requests per second from user-agents like "python-requests/x.x" or
"Python-urllib/x.x" or "Java/x.x" etc, and it's impossible for us to tell good
bots from bad bots if they don't identify themselves properly.

> * What is the recommended approach for automated tools to access
> kernel.org/releases.json?

Set your user-agent to something like:

    "kdevops-ci/{version} (contact@address.here)"

-K
Theodore Ts'o April 11, 2025, 2:18 p.m. UTC | #4
On Thu, Apr 10, 2025 at 08:45:30AM -0400, Konstantin Ryabitsev wrote:
> Set your user-agent to something like:
> 
>     "kdevops-ci/{version} (contact@address.here)"
>

Hi Konstantin,

Is this something you'd like continuous integration tools use in
generation?  I'm using the git CLI called out from go co, but I could
do something like

export GIT_HTTP_USER_AGENT="gce-xfstests-20250411/3-g42bcd9aa tytso@thunk.org"

(Where the version would be a lightly edited version of $(git
describe) from the xfstests-bld repository.)

The other alternative that I've tried is to replace git.kernel.org
with kernel.googlesource.com as the git mirror, is supposed to be only
a few minutes behind git.kernel.org and presumably is closer to a GCE
VM from a network perspective.

Do you have any advice or preference about the adviseability of these
approaches?

       	    	       	     	 	       - Ted
Konstantin Ryabitsev April 11, 2025, 3:25 p.m. UTC | #5
On Fri, Apr 11, 2025 at 10:18:00AM -0400, Theodore Ts'o wrote:
> Hi Konstantin,
> 
> Is this something you'd like continuous integration tools use in
> generation? 

Yes, I think that in general it's just a good form for (well-behaved) bots to
report something other than the default library name and version in their
user-agent. When dealing with distributed crawler bots, a user-agent string is
often the only thing we have to rely on when blocking access, so if your bot
is indistinguishable from a hostile bot, you will be caught in the carnage.

> I'm using the git CLI called out from go co, but I could
> do something like
> 
> export GIT_HTTP_USER_AGENT="gce-xfstests-20250411/3-g42bcd9aa tytso@thunk.org"

For actual git requests it's fine if it just has git's default user-agent.
Obviously, we are not going to start blocking that. :)

> The other alternative that I've tried is to replace git.kernel.org
> with kernel.googlesource.com as the git mirror, is supposed to be only
> a few minutes behind git.kernel.org and presumably is closer to a GCE
> VM from a network perspective.

I'm fine with that as well -- just as long as you keep in mind that it can go
away at any time the way many Google things sometimes do. I'm also considering
running stable/next/mainline forks on several major forges as mirror-only
repos that are updated immediately after each push, so people can use them as
an alternative to googlesource.

-K
James Bottomley April 11, 2025, 4:48 p.m. UTC | #6
On Fri, 2025-04-11 at 11:25 -0400, Konstantin Ryabitsev wrote:
> On Fri, Apr 11, 2025 at 10:18:00AM -0400, Theodore Ts'o wrote:
[...]
> > The other alternative that I've tried is to replace git.kernel.org
> > with kernel.googlesource.com as the git mirror, is supposed to be
> > only a few minutes behind git.kernel.org and presumably is closer
> > to a GCE VM from a network perspective.
> 
> I'm fine with that as well -- just as long as you keep in mind that
> it can go away at any time the way many Google things sometimes do.
> I'm also considering running stable/next/mainline forks on several
> major forges as mirror-only repos that are updated immediately after
> each push, so people can use them as an alternative to googlesource.

Just on this point, the load from AI bots is presumably mostly
emanating from various public clouds that provide AI services.  It does
seem to me that those clouds having mirror repositories (even if they
aren't public) that their AI training would use would help to lower the
AI bot load on kernel.org and provide faster training to the cloud that
did this (win/win).  Should kernel.org have an official program to
facilitate this?

Regards,

Jaems
Laurent Pinchart April 11, 2025, 4:59 p.m. UTC | #7
On Fri, Apr 11, 2025 at 12:48:45PM -0400, James Bottomley wrote:
> On Fri, 2025-04-11 at 11:25 -0400, Konstantin Ryabitsev wrote:
> > On Fri, Apr 11, 2025 at 10:18:00AM -0400, Theodore Ts'o wrote:
> [...]
> > > The other alternative that I've tried is to replace git.kernel.org
> > > with kernel.googlesource.com as the git mirror, is supposed to be
> > > only a few minutes behind git.kernel.org and presumably is closer
> > > to a GCE VM from a network perspective.
> > 
> > I'm fine with that as well -- just as long as you keep in mind that
> > it can go away at any time the way many Google things sometimes do.
> > I'm also considering running stable/next/mainline forks on several
> > major forges as mirror-only repos that are updated immediately after
> > each push, so people can use them as an alternative to googlesource.
> 
> Just on this point, the load from AI bots is presumably mostly
> emanating from various public clouds that provide AI services.  It does
> seem to me that those clouds having mirror repositories (even if they
> aren't public) that their AI training would use would help to lower the
> AI bot load on kernel.org and provide faster training to the cloud that
> did this (win/win).  Should kernel.org have an official program to
> facilitate this?

Do we want, as a community, to facilitate GPL violations ?
Konstantin Ryabitsev April 11, 2025, 5 p.m. UTC | #8
On Fri, Apr 11, 2025 at 12:48:45PM -0400, James Bottomley wrote:
> > I'm fine with that as well -- just as long as you keep in mind that
> > it can go away at any time the way many Google things sometimes do.
> > I'm also considering running stable/next/mainline forks on several
> > major forges as mirror-only repos that are updated immediately after
> > each push, so people can use them as an alternative to googlesource.
> 
> Just on this point, the load from AI bots is presumably mostly
> emanating from various public clouds that provide AI services.  It does
> seem to me that those clouds having mirror repositories (even if they
> aren't public) that their AI training would use would help to lower the
> AI bot load on kernel.org and provide faster training to the cloud that
> did this (win/win).  Should kernel.org have an official program to
> facilitate this?

We already do make it very easy to mirror everything we have. You can set up
full replicas of git.kernel.org and lore.kernel.org that are updated within
seconds -- and I know of companies who maintain such replicas for their
internal needs.

However, I don't think that will have any measurable impact on LLM learning
bots, because it's less effort for such outfits to just buy residential DDoS
bot farms and throw them at the internet as fast and as hard as they can.

-K
Theodore Ts'o April 11, 2025, 5:09 p.m. UTC | #9
On Fri, Apr 11, 2025 at 11:25:36AM -0400, Konstantin Ryabitsev wrote:
> 
> For actual git requests it's fine if it just has git's default user-agent.
> Obviously, we are not going to start blocking that. :)

A while back we did get blocked once or twice; I assume because of
some IP Address or IP range rate limit?  The following day, the Kernel
Compilation Service (KCS) VM had been shutdown and restarted with a
new IP address, we had no trouble getting the new linux-next branch.

Would having a different user-agent help in that case?

> I'm fine with that as well -- just as long as you keep in mind that it can go
> away at any time the way many Google things sometimes do. I'm also considering
> running stable/next/mainline forks on several major forges as mirror-only
> repos that are updated immediately after each push, so people can use them as
> an alternative to googlesource.

What I might do is to have my system silently rewrite git.kernel.org
to one or more mirrors, with an automatic fallback if particular
mirror disappears.  That does have the risk if the mirror sticks
around, but stops updating.  I suspect that's less likely to happen,
and presumably we can either (a) have some kind of hueristic for those
branches which are known to be regularly updated, or (b) rely on a
human to notice that particular failure case.

						- Ted
James Bottomley April 11, 2025, 5:13 p.m. UTC | #10
On Fri, 2025-04-11 at 13:00 -0400, Konstantin Ryabitsev wrote:

> On Fri, Apr 11, 2025 at 12:48:45PM -0400, James Bottomley wrote:
> 
> > 
> > > I'm fine with that as well -- just as long as you keep in mind
that it can go away at any time the way many Google things sometimes
do.  I'm also considering running stable/next/mainline forks on several
major forges as mirror-only repos that are updated immediately after
each push, so people can use them as an alternative to googlesource.
> > 
> > 
> > Just on this point, the load from AI bots is presumably mostly
> > emanating from various public clouds that provide AI services.  It
does seem to me that those clouds having mirror repositories (even if
they aren't public) that their AI training would use would help to
lower the AI bot load on kernel.org and provide faster training to the
cloud that did this (win/win).  Should kernel.org have an official
program to facilitate this?
> 
> 
> We already do make it very easy to mirror everything we have. You can
set up full replicas of git.kernel.org and lore.kernel.org that are
updated within seconds -- and I know of companies who maintain such
replicas for their internal needs.


OK, where's the URL describing this? in case I happened to know a major
cloud provider who might be interested ...


> However, I don't think that will have any measurable impact on LLM
learning bots, because it's less effort for such outfits to just buy
> residential DDoS bot farms and throw them at the internet as fast and
as hard as they can.


Well, carrot and stick: if you're busy locking out AI crawlers because
of DDoS farms, then even cloud based AI crawlers get caught, so it acts
as an incentive to cloud providers to set this up to attract business.

Regards,

James
Luck, Tony April 11, 2025, 6:23 p.m. UTC | #11
On Thu, Apr 10, 2025 at 08:45:30AM -0400, Konstantin Ryabitsev wrote:
> On Thu, Apr 10, 2025 at 10:05:28AM +0200, Daniel Gomez wrote:
> > We've started encountering "HTTP Error 403: Forbidden" errors in kdevops
> > when querying https://www.kernel.org/releases.json from our CI environments/
> > deployments. We're using a Python script with the urllib library to fetch the
> > latest kernel release information [1].
> 
> Yes, I'm trying to deal with bots who don't identify themselves. We're seeing
> many requests per second from user-agents like "python-requests/x.x" or
> "Python-urllib/x.x" or "Java/x.x" etc, and it's impossible for us to tell good
> bots from bad bots if they don't identify themselves properly.
> 
> > * What is the recommended approach for automated tools to access
> > kernel.org/releases.json?
> 
> Set your user-agent to something like:
> 
>     "kdevops-ci/{version} (contact@address.here)"

Would it help things if you just delayed response to *every*
query by a couple of seconds?

While I love it that lore.kernel.org answers my queries more
or less instantly, my user experience wouldn't suffer significantly
if I had to wait for a short (to humans) but long (to bots) time.

-Tony
Jonathan Corbet April 11, 2025, 8:08 p.m. UTC | #12
James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> On Fri, 2025-04-11 at 11:25 -0400, Konstantin Ryabitsev wrote:
>> On Fri, Apr 11, 2025 at 10:18:00AM -0400, Theodore Ts'o wrote:
> [...]
>> > The other alternative that I've tried is to replace git.kernel.org
>> > with kernel.googlesource.com as the git mirror, is supposed to be
>> > only a few minutes behind git.kernel.org and presumably is closer
>> > to a GCE VM from a network perspective.
>> 
>> I'm fine with that as well -- just as long as you keep in mind that
>> it can go away at any time the way many Google things sometimes do.
>> I'm also considering running stable/next/mainline forks on several
>> major forges as mirror-only repos that are updated immediately after
>> each push, so people can use them as an alternative to googlesource.
>
> Just on this point, the load from AI bots is presumably mostly
> emanating from various public clouds that provide AI services.

Have a look at Bright Data - they claim 100M+ *residential* IPs for
scraping.  They seem to operate a VPN service for "free", to use it you
just have to allow them to use your connection for this kind of stuff.

jon
Konstantin Ryabitsev April 11, 2025, 8:38 p.m. UTC | #13
On Fri, Apr 11, 2025 at 01:13:10PM -0400, James Bottomley wrote:
> > We already do make it very easy to mirror everything we have. You can
> set up full replicas of git.kernel.org and lore.kernel.org that are
> updated within seconds -- and I know of companies who maintain such
> replicas for their internal needs.
> 
> OK, where's the URL describing this? in case I happened to know a major
> cloud provider who might be interested ...

I'll see if I can update our docs. I've published a few things in the past as
blog posts, but there isn't a consolidated document.

-K
Konstantin Ryabitsev April 11, 2025, 8:54 p.m. UTC | #14
On Thu, Apr 10, 2025 at 12:30:37PM +0200, Borislav Petkov wrote:
> +1
> 
> I have a script which tests whether a lore link: URL I'm adding to patches, is
> correct. I.e. whether
> 
> https://lore.kernel.org/r/<Message-ID>
> 
> can be read.
> 
> What would be the suggested thing to do in such cases?

You can continue doing it -- this is a lightweight operation. Just use a HEAD
request instead of a GET request.

E.g.:

    [[ $(curl -o/dev/null -sIw '%{response_code}' https://lore.kernel.org/all/20250411201912.2872-1-annie.li@oracle.co/) -gt 200 ]] && echo "ivalid" || echo "valid"

-K
Luis Chamberlain April 11, 2025, 8:56 p.m. UTC | #15
On Fri, Apr 11, 2025 at 01:00:24PM -0400, Konstantin Ryabitsev wrote:
> On Fri, Apr 11, 2025 at 12:48:45PM -0400, James Bottomley wrote:
> > > I'm fine with that as well -- just as long as you keep in mind that
> > > it can go away at any time the way many Google things sometimes do.
> > > I'm also considering running stable/next/mainline forks on several
> > > major forges as mirror-only repos that are updated immediately after
> > > each push, so people can use them as an alternative to googlesource.
> > 
> > Just on this point, the load from AI bots is presumably mostly
> > emanating from various public clouds that provide AI services.  It does
> > seem to me that those clouds having mirror repositories (even if they
> > aren't public) that their AI training would use would help to lower the
> > AI bot load on kernel.org and provide faster training to the cloud that
> > did this (win/win).  Should kernel.org have an official program to
> > facilitate this?
> 
> We already do make it very easy to mirror everything we have. You can set up
> full replicas of git.kernel.org and lore.kernel.org that are updated within
> seconds -- and I know of companies who maintain such replicas for their
> internal needs.

Does *that* setup also leverage CDN? My gathering is that kernel.org
would need to opt-in for that so perhaps it cannot scale well.

What I'm talking about, say, we have good CI citizen infrastructure that
not only does it not want to DDOS kernel.org, but *also* wants to
leverage its own good-mirror citizens, *but* without having to require
changes to userspace to use these mirrors, do we have a solution for
this?

Seems debian.org uses Varnish HTTP caching layers and not sure if that
may do it:

curl -I http://deb.debian.org/debian/
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 6123
Server: Apache
X-Content-Type-Options: nosniff
X-Frame-Options: sameorigin
Referrer-Policy: no-referrer
X-Xss-Protection: 1
Permissions-Policy: interest-cohort=()
X-Clacks-Overhead: GNU Terry Pratchett
Content-Type: text/html;charset=UTF-8
Via: 1.1 varnish, 1.1 varnish
Accept-Ranges: bytes
Age: 0
Date: Fri, 11 Apr 2025 20:51:20 GMT
X-Served-By: cache-ams21082-AMS, cache-sjc1000099-SJC
X-Cache: HIT, MISS
X-Cache-Hits: 2, 0
X-Timer: S1744404680.387211,VS0,VE144
Vary: Accept-Encoding

  Luis
Konstantin Ryabitsev April 11, 2025, 9:04 p.m. UTC | #16
On Fri, Apr 11, 2025 at 01:56:12PM -0700, Luis Chamberlain wrote:
> > We already do make it very easy to mirror everything we have. You can set up
> > full replicas of git.kernel.org and lore.kernel.org that are updated within
> > seconds -- and I know of companies who maintain such replicas for their
> > internal needs.
> 
> Does *that* setup also leverage CDN? My gathering is that kernel.org
> would need to opt-in for that so perhaps it cannot scale well.

No, we can be completely unaware of it if it's for your in-house needs.

> What I'm talking about, say, we have good CI citizen infrastructure that
> not only does it not want to DDOS kernel.org, but *also* wants to
> leverage its own good-mirror citizens, *but* without having to require
> changes to userspace to use these mirrors, do we have a solution for
> this?

You can use git's insteadOf magic to make all git requests go to your local
mirror. E.g. by putting this into /etc/gitconfig on your CI nodes:

    [url "git://your.local.mirror.url"]
        insteadOf = git://git.kernel.org
        insteadOf = https://git.kernel.org

This way you can quickly swap between using upstream and using a local mirror
without modifying your scripts.

-K
Luis Chamberlain April 11, 2025, 11:37 p.m. UTC | #17
On Fri, Apr 11, 2025 at 05:04:36PM -0400, Konstantin Ryabitsev wrote:
> On Fri, Apr 11, 2025 at 01:56:12PM -0700, Luis Chamberlain wrote:
> > > We already do make it very easy to mirror everything we have. You can set up
> > > full replicas of git.kernel.org and lore.kernel.org that are updated within
> > > seconds -- and I know of companies who maintain such replicas for their
> > > internal needs.
> > 
> > Does *that* setup also leverage CDN? My gathering is that kernel.org
> > would need to opt-in for that so perhaps it cannot scale well.
> 
> No, we can be completely unaware of it if it's for your in-house needs.
> 
> > What I'm talking about, say, we have good CI citizen infrastructure that
> > not only does it not want to DDOS kernel.org, but *also* wants to
> > leverage its own good-mirror citizens, *but* without having to require
> > changes to userspace to use these mirrors, do we have a solution for
> > this?
> 
> You can use git's insteadOf magic to make all git requests go to your local
> mirror. E.g. by putting this into /etc/gitconfig on your CI nodes:
> 
>     [url "git://your.local.mirror.url"]
>         insteadOf = git://git.kernel.org
>         insteadOf = https://git.kernel.org

Beautiful, thanks!

And... do we know if most cloud providers mirror kernel.org? If so then
CIs that want to leverage cloud can use the trick above for each cloud
solution.

> This way you can quickly swap between using upstream and using a local mirror
> without modifying your scripts.

That's hella cool.

  Luis
Theodore Ts'o April 12, 2025, 2:19 a.m. UTC | #18
On Fri, Apr 11, 2025 at 04:38:07PM -0400, Konstantin Ryabitsev wrote:
> On Fri, Apr 11, 2025 at 01:13:10PM -0400, James Bottomley wrote:
> > > We already do make it very easy to mirror everything we have. You can
> > set up full replicas of git.kernel.org and lore.kernel.org that are
> > updated within seconds -- and I know of companies who maintain such
> > replicas for their internal needs.
> > 
> > OK, where's the URL describing this? in case I happened to know a major
> > cloud provider who might be interested ...
> 
> I'll see if I can update our docs. I've published a few things in the past as
> blog posts, but there isn't a consolidated document.

So is this not completely up-to-date?

   https://www.kernel.org/mirroring-kernelorg-repositories.html

					- Ted
Borislav Petkov April 15, 2025, 9:01 a.m. UTC | #19
On Fri, Apr 11, 2025 at 04:54:25PM -0400, Konstantin Ryabitsev wrote:
> You can continue doing it -- this is a lightweight operation. Just use a HEAD
> request instead of a GET request.

Ok, here's what I have now:

                    headers = {
                        "User-Agent": "Boris patch massager script vp.py (bp@alien8.de)"
                    }
                    get = requests.head(link_url, headers=headers)
                    print(get.headers)

and for that done on the URL:

https://lore.kernel.org/20250414150951.5345-1-bp@kernel.org

it returns 302 with the Location header redirecting to the same thing but in
the /all/ range.

{'Server': 'nginx', 'Date': 'Tue, 15 Apr 2025 08:52:28 GMT', 'Content-Type': 'text/plain', 'Content-Length': '79', 'Connection': 'keep-alive', 'Age': '0', 'Location': 'http://lore.kernel.org/all/20250414150951.5345-1-bp@kernel.org/', 'Via': '1.1 varnish (Varnish/6.6)', 'X-Varnish': '17023135', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Strict-Transport-Security': 'max-age=15768001', 'Content-Security-Policy': "default-src 'self'; worker-src 'self' blob:; style-src 'self' 'unsafe-inline'; img-src https:"}

Now, if I query the URL in the /all/ range, it gives 301:

{'Server': 'nginx', 'Date': 'Tue, 15 Apr 2025 08:56:39 GMT', 'Content-Type': 'text/plain', 'Content-Length': '79', 'Connection': 'keep-alive', 'Age': '0', 'Location': 'http://lore.kernel.org/all/20250414150951.5345-1-bp@kernel.org/', 'Via': '1.1 varnish (Varnish/6.6)', 'X-Varnish': '9754343', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Strict-Transport-Security': 'max-age=15768001', 'Content-Security-Policy': "default-src 'self'; worker-src 'self' blob:; style-src 'self' 'unsafe-inline'; img-src https:"}

giving me the unencrypted http:// Location and if I do that it gives me 301
again to the *encrypted* URL:

{'Server': 'nginx', 'Date': 'Tue, 15 Apr 2025 08:57:15 GMT', 'Content-Type': 'text/html', 'Content-Length': '162', 'Connection': 'keep-alive', 'Location': 'https://lore.kernel.org/all/20250414150951.5345-1-bp@kernel.org'}

LOL.

So, what would be the best and the lowest overhead thing to use?

All I know is that people requested the https:// variant in Links in the past
so we probably should keep doing that.

Thx.
diff mbox series

Patch

diff --git a/scripts/generate_refs.py b/scripts/generate_refs.py
index 5171414..41011cf 100755
--- a/scripts/generate_refs.py
+++ b/scripts/generate_refs.py
@@ -302,7 +302,9 @@  def kreleases(args) -> None:

     reflist = []
     if _check_connection("kernel.org", 80):
-        with urllib.request.urlopen("https://www.kernel.org/releases.json") as url:
+        _url = "https://www.kernel.org/releases.json"
+        req = urllib.request.Request(_url, headers={"User-Agent": "Mozilla/5.0"})
+        with urllib.request.urlopen(req) as url:
             data = json.load(url)

             for release in data["releases"]: