diff mbox series

[v2] gitk: Make web links clickable

Message ID 20190829012702.GB3297@blackberry (mailing list archive)
State New, archived
Headers show
Series [v2] gitk: Make web links clickable | expand

Commit Message

Paul Mackerras Aug. 29, 2019, 1:27 a.m. UTC
This makes gitk look for http or https URLs in the commit description
and make the URLs clickable.  Clicking on them will invoke an external
web browser with the URL.

The web browser command is by default "xdg-open" on Linux, "open" on
MacOS, and "cmd /c start" on Windows.  The command can be changed in
the preferences window, and it can include parameters as well as the
command name.  If it is set to the empty string then URLs will no
longer be made clickable.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
v2: Match URLs anywhere, not just after [Bug]Link:.

 gitk | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

Comments

Pratyush Yadav Aug. 29, 2019, 6:32 p.m. UTC | #1
On 29/08/19 11:27AM, Paul Mackerras wrote:
> This makes gitk look for http or https URLs in the commit description
> and make the URLs clickable.  Clicking on them will invoke an external
> web browser with the URL.
> 
> The web browser command is by default "xdg-open" on Linux, "open" on
> MacOS, and "cmd /c start" on Windows.  The command can be changed in
> the preferences window, and it can include parameters as well as the
> command name.  If it is set to the empty string then URLs will no
> longer be made clickable.
> 
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> v2: Match URLs anywhere, not just after [Bug]Link:.
> 
>  gitk | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/gitk b/gitk
> index a14d7a1..2a0d00c 100755
> --- a/gitk
> +++ b/gitk
> @@ -7016,6 +7016,7 @@ proc commit_descriptor {p} {
>  
>  # append some text to the ctext widget, and make any SHA1 ID
>  # that we know about be a clickable link.
> +# Also look for URLs of the form "http[s]://..." and make them web links.
>  proc appendwithlinks {text tags} {
>      global ctext linknum curview
>  
> @@ -7032,6 +7033,18 @@ proc appendwithlinks {text tags} {
>  	setlink $linkid link$linknum
>  	incr linknum
>      }
> +    set wlinks [regexp -indices -all -inline -line \
> +		    {https?://[^[:space:]]+} $text]

I know I suggested searching till the first non-whitespace character, 
but thinking more about, there are some problematic cases. Say someone 
has a commit message like:
  
  Foo bar baz (more details at https://example.com/hello)

Or like:

  Check out https://foo.com, https://bar.com

In the first example, the closing parenthesis gets included in the link, 
but shouldn't be. In the second, the comma after foo.com would be 
included in the link, but shouldn't be. So maybe use a more 
sophisticated regex?

A quick Google search gives out the following options [0][1].

[0] gives the following regex:

  https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

It is kind of ugly to look at, and I'm not even sure if there are any 
syntax differences with Tcl's regex library.

[1] lists a bunch of regexes and which URLs they work on and which ones 
they don't. The smallest among them I found is:

  @^(https?|ftp)://[^\s/$.?#].[^\s]*$@iS

Again, I'm not sure how well this would work with Tcl's regex library, 
or how commonly these URL patterns appear in actual commit messages.  
Just something to consider.

[0] https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url
[1] https://mathiasbynens.be/demo/url-regex

[snip]
Paul Mackerras Sept. 13, 2019, 11:33 p.m. UTC | #2
On Fri, Aug 30, 2019 at 12:02:07AM +0530, Pratyush Yadav wrote:
> On 29/08/19 11:27AM, Paul Mackerras wrote:
> > This makes gitk look for http or https URLs in the commit description
> > and make the URLs clickable.  Clicking on them will invoke an external
> > web browser with the URL.
> > 
> > The web browser command is by default "xdg-open" on Linux, "open" on
> > MacOS, and "cmd /c start" on Windows.  The command can be changed in
> > the preferences window, and it can include parameters as well as the
> > command name.  If it is set to the empty string then URLs will no
> > longer be made clickable.
> > 
> > Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> > ---
> > v2: Match URLs anywhere, not just after [Bug]Link:.
> > 
> >  gitk | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 50 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gitk b/gitk
> > index a14d7a1..2a0d00c 100755
> > --- a/gitk
> > +++ b/gitk
> > @@ -7016,6 +7016,7 @@ proc commit_descriptor {p} {
> >  
> >  # append some text to the ctext widget, and make any SHA1 ID
> >  # that we know about be a clickable link.
> > +# Also look for URLs of the form "http[s]://..." and make them web links.
> >  proc appendwithlinks {text tags} {
> >      global ctext linknum curview
> >  
> > @@ -7032,6 +7033,18 @@ proc appendwithlinks {text tags} {
> >  	setlink $linkid link$linknum
> >  	incr linknum
> >      }
> > +    set wlinks [regexp -indices -all -inline -line \
> > +		    {https?://[^[:space:]]+} $text]
> 
> I know I suggested searching till the first non-whitespace character, 
> but thinking more about, there are some problematic cases. Say someone 
> has a commit message like:
>   
>   Foo bar baz (more details at https://example.com/hello)
> 
> Or like:
> 
>   Check out https://foo.com, https://bar.com
> 
> In the first example, the closing parenthesis gets included in the link, 
> but shouldn't be. In the second, the comma after foo.com would be 
> included in the link, but shouldn't be. So maybe use a more 
> sophisticated regex?

I did think about that, but it seems to be impossible to get it right
in all cases, so I went for simple and obvious.  In particular I don't
see how to handle the common case of a '.' immediately following the
URL, since '.' is a legal character in a URL.

> A quick Google search gives out the following options [0][1].
> 
> [0] gives the following regex:
> 
>   https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)
> 
> It is kind of ugly to look at, and I'm not even sure if there are any 
> syntax differences with Tcl's regex library.
> 
> [1] lists a bunch of regexes and which URLs they work on and which ones 
> they don't. The smallest among them I found is:
> 
>   @^(https?|ftp)://[^\s/$.?#].[^\s]*$@iS
> 
> Again, I'm not sure how well this would work with Tcl's regex library, 
> or how commonly these URL patterns appear in actual commit messages.  
> Just something to consider.
> 
> [0] https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url
> [1] https://mathiasbynens.be/demo/url-regex

I think I would be inclined to make the regex customizable, since that
would also allow the user to match ftp or other URLs if they want.
The only difficulty with that is if there are subexpressions, that
will change how we have to interpret the list returned by the
regexp -indices -all -inline command.

Paul.
Pratyush Yadav Sept. 14, 2019, 2:30 p.m. UTC | #3
On 14/09/19 09:33AM, Paul Mackerras wrote:
> On Fri, Aug 30, 2019 at 12:02:07AM +0530, Pratyush Yadav wrote:
> > On 29/08/19 11:27AM, Paul Mackerras wrote:
> > 
> > I know I suggested searching till the first non-whitespace character, 
> > but thinking more about, there are some problematic cases. Say someone 
> > has a commit message like:
> >   
> >   Foo bar baz (more details at https://example.com/hello)
> > 
> > Or like:
> > 
> >   Check out https://foo.com, https://bar.com
> > 
> > In the first example, the closing parenthesis gets included in the link, 
> > but shouldn't be. In the second, the comma after foo.com would be 
> > included in the link, but shouldn't be. So maybe use a more 
> > sophisticated regex?
> 
> I did think about that, but it seems to be impossible to get it right
> in all cases, so I went for simple and obvious.  In particular I don't
> see how to handle the common case of a '.' immediately following the
> URL, since '.' is a legal character in a URL.
> 
> > A quick Google search gives out the following options [0][1].
> > 
> > [0] gives the following regex:
> > 
> >   https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)
> > 
> > It is kind of ugly to look at, and I'm not even sure if there are any 
> > syntax differences with Tcl's regex library.
> > 
> > [1] lists a bunch of regexes and which URLs they work on and which ones 
> > they don't. The smallest among them I found is:
> > 
> >   @^(https?|ftp)://[^\s/$.?#].[^\s]*$@iS
> > 
> > Again, I'm not sure how well this would work with Tcl's regex library, 
> > or how commonly these URL patterns appear in actual commit messages.  
> > Just something to consider.
> > 
> > [0] https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url
> > [1] https://mathiasbynens.be/demo/url-regex
> 
> I think I would be inclined to make the regex customizable, since that
> would also allow the user to match ftp or other URLs if they want.
> The only difficulty with that is if there are subexpressions, that
> will change how we have to interpret the list returned by the
> regexp -indices -all -inline command.

That just puts the responsibility of parsing the URL on the user, it 
doesn't solve the problem.

I don't have any numbers, but I think most problematic cases are when 
there are some trailing characters. We aren't dealing with malicious 
actors that want to do something bad or make gitk crash. IMO it is 
reasonable to expect legal URLs in a commit message.

So instead of trying to encompass all possible legal URLs and removing 
all illegal URLs, how about using a simple regex for basic filtering to 
weed out some false positives, and then trimming illegal trailing 
characters. These trailing characters would most likely be comma, 
period, parenthesis, question marks, quotation marks, etc. This way the 
logic stays simple and we tackle more real world problems.

Sounds reasonable?
diff mbox series

Patch

diff --git a/gitk b/gitk
index a14d7a1..2a0d00c 100755
--- a/gitk
+++ b/gitk
@@ -7016,6 +7016,7 @@  proc commit_descriptor {p} {
 
 # append some text to the ctext widget, and make any SHA1 ID
 # that we know about be a clickable link.
+# Also look for URLs of the form "http[s]://..." and make them web links.
 proc appendwithlinks {text tags} {
     global ctext linknum curview
 
@@ -7032,6 +7033,18 @@  proc appendwithlinks {text tags} {
 	setlink $linkid link$linknum
 	incr linknum
     }
+    set wlinks [regexp -indices -all -inline -line \
+		    {https?://[^[:space:]]+} $text]
+    foreach l $wlinks {
+	set s2 [lindex $l 0]
+	set e2 [lindex $l 1]
+	set url [string range $text $s2 $e2]
+	incr e2
+	$ctext tag delete link$linknum
+	$ctext tag add link$linknum "$start + $s2 c" "$start + $e2 c"
+	setwlink $url link$linknum
+	incr linknum
+    }
 }
 
 proc setlink {id lk} {
@@ -7064,6 +7077,18 @@  proc setlink {id lk} {
     }
 }
 
+proc setwlink {url lk} {
+    global ctext
+    global linkfgcolor
+    global web_browser
+
+    if {$web_browser eq {}} return
+    $ctext tag conf $lk -foreground $linkfgcolor -underline 1
+    $ctext tag bind $lk <1> [list browseweb $url]
+    $ctext tag bind $lk <Enter> {linkcursor %W 1}
+    $ctext tag bind $lk <Leave> {linkcursor %W -1}
+}
+
 proc appendshortlink {id {pre {}} {post {}}} {
     global ctext linknum
 
@@ -7098,6 +7123,16 @@  proc linkcursor {w inc} {
     }
 }
 
+proc browseweb {url} {
+    global web_browser
+
+    if {$web_browser eq {}} return
+    # Use eval here in case $web_browser is a command plus some arguments
+    if {[catch {eval exec $web_browser [list $url] &} err]} {
+	error_popup "[mc "Error starting web browser:"] $err"
+    }
+}
+
 proc viewnextline {dir} {
     global canv linespc
 
@@ -11488,7 +11523,7 @@  proc create_prefs_page {w} {
 proc prefspage_general {notebook} {
     global NS maxwidth maxgraphpct showneartags showlocalchanges
     global tabstop limitdiffs autoselect autosellen extdifftool perfile_attrs
-    global hideremotes want_ttk have_ttk maxrefs
+    global hideremotes want_ttk have_ttk maxrefs web_browser
 
     set page [create_prefs_page $notebook.general]
 
@@ -11539,6 +11574,13 @@  proc prefspage_general {notebook} {
     pack configure $page.extdifff.l -padx 10
     grid x $page.extdifff $page.extdifft -sticky ew
 
+    ${NS}::entry $page.webbrowser -textvariable web_browser
+    ${NS}::frame $page.webbrowserf
+    ${NS}::label $page.webbrowserf.l -text [mc "Web browser" ]
+    pack $page.webbrowserf.l -side left
+    pack configure $page.webbrowserf.l -padx 10
+    grid x $page.webbrowserf $page.webbrowser -sticky ew
+
     ${NS}::label $page.lgen -text [mc "General options"]
     grid $page.lgen - -sticky w -pady 10
     ${NS}::checkbutton $page.want_ttk -variable want_ttk \
@@ -12310,6 +12352,7 @@  if {[tk windowingsystem] eq "win32"} {
     set bgcolor SystemWindow
     set fgcolor SystemWindowText
     set selectbgcolor SystemHighlight
+    set web_browser "cmd /c start"
 } else {
     set uicolor grey85
     set uifgcolor black
@@ -12317,6 +12360,11 @@  if {[tk windowingsystem] eq "win32"} {
     set bgcolor white
     set fgcolor black
     set selectbgcolor gray85
+    if {[tk windowingsystem] eq "aqua"} {
+	set web_browser "open"
+    } else {
+	set web_browser "xdg-open"
+    }
 }
 set diffcolors {red "#00a000" blue}
 set diffcontext 3
@@ -12390,6 +12438,7 @@  set config_variables {
     filesepbgcolor filesepfgcolor linehoverbgcolor linehoverfgcolor
     linehoveroutlinecolor mainheadcirclecolor workingfilescirclecolor
     indexcirclecolor circlecolors linkfgcolor circleoutlinecolor
+    web_browser
 }
 foreach var $config_variables {
     config_init_trace $var