diff mbox series

Remove URL redirect project lookup

Message ID 20220225031135.4136158-1-robh@kernel.org (mailing list archive)
State Accepted
Headers show
Series Remove URL redirect project lookup | expand

Commit Message

Rob Herring Feb. 25, 2022, 3:11 a.m. UTC
Now that lore indexes all messages, there's no need to lookup the project
for the message-id. If the project is not specified, then 'all' is used.

The primary benefit of this change is that cached accesses can now work
offline instead of splatting with a network error.

Signed-off-by: Rob Herring <robh@kernel.org>
---

My usecase is twofold. First I want to speed up opening a thread by 
having it fetched in the background and cached. Second, I want to be 
able to work offline by fetching a list of threads (my PW queue) in 
advance and using the offline copy. With a sufficiently long cache 
timeout, the cache works perfectly for this use. Though maybe a 'use the 
cache if there's a network failure' mode is needed instead of always 
timing out the cache.

I also have this working using b4 to fetch my queue to an mbox and 
then using the 'use local mbox' option. This mostly works except for 
the handling of 'From ' in message bodies which is problematic for mbox 
format. The cache manages to avoid this problem.

Rob

 b4/__init__.py | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

Comments

Rob Herring March 21, 2022, 4:41 p.m. UTC | #1
On Thu, Feb 24, 2022 at 9:11 PM Rob Herring <robh@kernel.org> wrote:
>
> Now that lore indexes all messages, there's no need to lookup the project
> for the message-id. If the project is not specified, then 'all' is used.
>
> The primary benefit of this change is that cached accesses can now work
> offline instead of splatting with a network error.
>
> Signed-off-by: Rob Herring <robh@kernel.org>
> ---
>
> My usecase is twofold. First I want to speed up opening a thread by
> having it fetched in the background and cached. Second, I want to be
> able to work offline by fetching a list of threads (my PW queue) in
> advance and using the offline copy. With a sufficiently long cache
> timeout, the cache works perfectly for this use. Though maybe a 'use the
> cache if there's a network failure' mode is needed instead of always
> timing out the cache.
>
> I also have this working using b4 to fetch my queue to an mbox and
> then using the 'use local mbox' option. This mostly works except for
> the handling of 'From ' in message bodies which is problematic for mbox
> format. The cache manages to avoid this problem.
>
> Rob
>
>  b4/__init__.py | 23 +++++++----------------
>  1 file changed, 7 insertions(+), 16 deletions(-)

Ping!

>
> diff --git a/b4/__init__.py b/b4/__init__.py
> index 0d506bbaa649..ec1a6da44144 100644
> --- a/b4/__init__.py
> +++ b/b4/__init__.py
> @@ -2235,6 +2235,9 @@ def get_pi_thread_by_url(t_mbx_url, nocache=False):
>          logger.critical('Grabbing thread from %s', t_mbx_url.split('://')[1])
>          session = get_requests_session()
>          resp = session.get(t_mbx_url)
> +        if resp.status_code == 404:
> +            logger.critical('That message-id is not known.')
> +            return None
>          if resp.status_code != 200:
>              logger.critical('Server returned an error: %s', resp.status_code)
>              return None
> @@ -2263,22 +2266,10 @@ def get_pi_thread_by_url(t_mbx_url, nocache=False):
>  def get_pi_thread_by_msgid(msgid, useproject=None, nocache=False, onlymsgids: Optional[set] = None):
>      qmsgid = urllib.parse.quote_plus(msgid)
>      config = get_main_config()
> -    # Grab the head from lore, to see where we are redirected
> -    midmask = config['midmask'] % qmsgid
> -    loc = urllib.parse.urlparse(midmask)
> -    if useproject:
> -        projurl = '%s://%s/%s' % (loc.scheme, loc.netloc, useproject)
> -    else:
> -        logger.info('Looking up %s', midmask)
> -        session = get_requests_session()
> -        resp = session.head(midmask)
> -        if resp.status_code < 300 or resp.status_code > 400:
> -            logger.critical('That message-id is not known.')
> -            return None
> -        # Pop msgid from the end of the redirect
> -        chunks = resp.headers['Location'].rstrip('/').split('/')
> -        projurl = '/'.join(chunks[:-1])
> -        resp.close()
> +    loc = urllib.parse.urlparse(config['midmask'])
> +    if not useproject:
> +        useproject = 'all'
> +    projurl = '%s://%s/%s' % (loc.scheme, loc.netloc, useproject)
>      t_mbx_url = '%s/%s/t.mbox.gz' % (projurl, qmsgid)
>      logger.debug('t_mbx_url=%s', t_mbx_url)
>
> --
> 2.32.0
>
Konstantin Ryabitsev March 21, 2022, 9:59 p.m. UTC | #2
On Mon, Mar 21, 2022 at 11:41:20AM -0500, Rob Herring wrote:
> > My usecase is twofold. First I want to speed up opening a thread by
> > having it fetched in the background and cached. Second, I want to be
> > able to work offline by fetching a list of threads (my PW queue) in
> > advance and using the offline copy. With a sufficiently long cache
> > timeout, the cache works perfectly for this use. Though maybe a 'use the
> > cache if there's a network failure' mode is needed instead of always
> > timing out the cache.
> >
> > I also have this working using b4 to fetch my queue to an mbox and
> > then using the 'use local mbox' option. This mostly works except for
> > the handling of 'From ' in message bodies which is problematic for mbox
> > format. The cache manages to avoid this problem.
> >
> > Rob
> >
> >  b4/__init__.py | 23 +++++++----------------
> >  1 file changed, 7 insertions(+), 16 deletions(-)
> 
> Ping!

Sorry, I've been a bit verklempt about things over the past few weeks. I'll
try to get to outstanding patches in very short order.

Best regards,
-K
Konstantin Ryabitsev June 14, 2022, 8:27 p.m. UTC | #3
On Thu, Feb 24, 2022 at 09:11:35PM -0600, Rob Herring wrote:
> Now that lore indexes all messages, there's no need to lookup the project
> for the message-id. If the project is not specified, then 'all' is used.

Rob:

Sorry for the long delay, but I did finally get around to it. I didn't quite
use your patch directly, because the goal is to also support non-lore
public-inbox installations, and they may not provide the unified index in
/all/, so we needed to keep the old lookup option available.

The change is in master as commit bfe5df6694c8115fa8402943b125c6e47c8eec08.

Thanks,
-K
diff mbox series

Patch

diff --git a/b4/__init__.py b/b4/__init__.py
index 0d506bbaa649..ec1a6da44144 100644
--- a/b4/__init__.py
+++ b/b4/__init__.py
@@ -2235,6 +2235,9 @@  def get_pi_thread_by_url(t_mbx_url, nocache=False):
         logger.critical('Grabbing thread from %s', t_mbx_url.split('://')[1])
         session = get_requests_session()
         resp = session.get(t_mbx_url)
+        if resp.status_code == 404:
+            logger.critical('That message-id is not known.')
+            return None
         if resp.status_code != 200:
             logger.critical('Server returned an error: %s', resp.status_code)
             return None
@@ -2263,22 +2266,10 @@  def get_pi_thread_by_url(t_mbx_url, nocache=False):
 def get_pi_thread_by_msgid(msgid, useproject=None, nocache=False, onlymsgids: Optional[set] = None):
     qmsgid = urllib.parse.quote_plus(msgid)
     config = get_main_config()
-    # Grab the head from lore, to see where we are redirected
-    midmask = config['midmask'] % qmsgid
-    loc = urllib.parse.urlparse(midmask)
-    if useproject:
-        projurl = '%s://%s/%s' % (loc.scheme, loc.netloc, useproject)
-    else:
-        logger.info('Looking up %s', midmask)
-        session = get_requests_session()
-        resp = session.head(midmask)
-        if resp.status_code < 300 or resp.status_code > 400:
-            logger.critical('That message-id is not known.')
-            return None
-        # Pop msgid from the end of the redirect
-        chunks = resp.headers['Location'].rstrip('/').split('/')
-        projurl = '/'.join(chunks[:-1])
-        resp.close()
+    loc = urllib.parse.urlparse(config['midmask'])
+    if not useproject:
+        useproject = 'all'
+    projurl = '%s://%s/%s' % (loc.scheme, loc.netloc, useproject)
     t_mbx_url = '%s/%s/t.mbox.gz' % (projurl, qmsgid)
     logger.debug('t_mbx_url=%s', t_mbx_url)