Message ID | 3e2346029e33ef9405b87d72a32d96d0aafb524a.1371595499.git.yann.morin.1998@free.fr (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Yann, Sorry for the late reply... Le Wednesday 19 June 2013 à 00:45 +0200, Yann E. MORIN a écrit : > From: "Yann E. MORIN" <yann.morin.1998@free.fr> > > When searching for symbols, return the symbols sorted by relevance. > > Sorting is done as thus: > - first, symbols with a prompt, [1] > - then, smallest offset, [2] > - then, shortest match, [3] > - then, highest relative match, [4] > - finally, alphabetical sort [5] > > So, searching (eg.) for 'P.*CI' : Nobody would actually search for that, so that's not a particularly good example to determine whether your sort order is sane or not. > > [1] Symbols of interest are probably those with a prompt, as they can be > changed, while symbols with no prompt are only for info. Thus: > PCIEASPM comes before PCI_ATS This is not necessarily true. I often look for symbols which have no prompt, and the information I am looking for is exactly "does this symbol have a prompt or is its value determined automatically"? > [2] Symbols that match earlier in the name are to be preferred over > symbols which match later. Thus: > PCI_MSI comes before WDTPCI This makes some sense, although it could have some unexpected side effects (e.g. FOO_BAR_PCI would be listed before SOMETHING_PCI_BAZBAZ, right?) > [3] The shortest match is (IMHO) more interesting than a longer one. > Thus: > PCI comes before PCMCIA This makes sense too, but I'm sure there are cases where it will be confusing too, and alphabetical order would do it in part too. > [4] The relative match is the ratio of the length of the match against > the length of the symbol. The more of a symbol name we match, the > more instersting that symbol is. Thus: > PCIEAER comes before PCIEASPM This is an obscure sort rule and I'm sure it will add more confusion than it will help. Alphabetical order should really be good enough at this point. > > [5] As fallback, sort symbols alphabetically > > This heuristic tries hard to get interesting symbols first in the list. I know I am the one whose question triggered this work from you, but in the end the "response" seems disproportionate. Having 5 different ordering rules is a lot, and that's quite a bit of code, which while not the most complex in the world, is still far from trivial and may require maintenance work in the future. The result of the search will be read by a human. As such, the order of the results should be immediately obvious. Returning what the user is most probably looking for first in the list is not as important as making it possible for the user to search the results for what he/she is looking for, and this can only be the case if the user understands the sort order instinctively. The 5 rules you used aren't exactly that IMHO. This situation is completely different from a web search for example, where the search engine has to select the most relevant pages because listing them all is out of the question. The original order was random and that's the worse possible order, so I am all for improving it. But I think the sort order should be driven by simple and instinctive rules. This means 1 or 2 ordering rules, not 5. As I recall, my original proposal was "exact match first, then alphabetical order." 2 rules, instinctive and easy to document. > In any case, exact match can (as previously) be requested by using > start-of-line and end-of-line in the search regexp: ^PCI$ Actually this solved my original problem just fine. As a matter of fact, I stopped replying to the original thread shortly after learning this, because I kind of lost interest in the following discussion. So I think it is more important to make it clearer that regular expressions are allowed, than to come up with a brilliant sort order. I know the help page says it, but the prompt itself only asks for a "(sub)string" and it is not immediately obvious (to me at least) that regular expressions are considered substrings. BTW, if you really decide to commit your proposal despite my concerns, you will have to document the sorting rules in the help page. Thanks,
On 24.6.2013 09:57, Jean Delvare wrote: > Hi Yann, > > Sorry for the late reply... > > Le Wednesday 19 June 2013 à 00:45 +0200, Yann E. MORIN a écrit : >> From: "Yann E. MORIN" <yann.morin.1998@free.fr> >> >> When searching for symbols, return the symbols sorted by relevance. >> >> Sorting is done as thus: >> - first, symbols with a prompt, [1] >> - then, smallest offset, [2] >> - then, shortest match, [3] >> - then, highest relative match, [4] >> - finally, alphabetical sort [5] >> >> So, searching (eg.) for 'P.*CI' : > > Nobody would actually search for that, so that's not a particularly good > example to determine whether your sort order is sane or not. > >> >> [1] Symbols of interest are probably those with a prompt, as they can be >> changed, while symbols with no prompt are only for info. Thus: >> PCIEASPM comes before PCI_ATS > > This is not necessarily true. I often look for symbols which have no > prompt, and the information I am looking for is exactly "does this > symbol have a prompt or is its value determined automatically"? > >> [2] Symbols that match earlier in the name are to be preferred over >> symbols which match later. Thus: >> PCI_MSI comes before WDTPCI > > This makes some sense, although it could have some unexpected side > effects (e.g. FOO_BAR_PCI would be listed before SOMETHING_PCI_BAZBAZ, > right?) > >> [3] The shortest match is (IMHO) more interesting than a longer one. >> Thus: >> PCI comes before PCMCIA > > This makes sense too, but I'm sure there are cases where it will be > confusing too, and alphabetical order would do it in part too. PCI does sort before PCMCIA alphabetically ;). Also, the match lenght will only be different when using regular expressions, so this rule will not matter in the usual case. >> [4] The relative match is the ratio of the length of the match against >> the length of the symbol. The more of a symbol name we match, the >> more instersting that symbol is. Thus: >> PCIEAER comes before PCIEASPM > > This is an obscure sort rule and I'm sure it will add more confusion > than it will help. Alphabetical order should really be good enough at > this point. When the prefix matches, then I agree that most people will expect the matches be alphabetically sorted. Maybe only do this comparison only if s1->so != 0, otherwise fallback to the strcmp() directly? Thanks, Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jean, Michal, All, On 2013-06-24 09:57 +0200, Jean Delvare spake thusly: > Le Wednesday 19 June 2013 à 00:45 +0200, Yann E. MORIN a écrit : > > When searching for symbols, return the symbols sorted by relevance. > > > > Sorting is done as thus: > > - first, symbols with a prompt, [1] > > - then, smallest offset, [2] > > - then, shortest match, [3] > > - then, highest relative match, [4] > > - finally, alphabetical sort [5] > > > > So, searching (eg.) for 'P.*CI' : > > Nobody would actually search for that, so that's not a particularly good > example to determine whether your sort order is sane or not. Yes, this was just to explain the sorting algorithm with a simple example. [--SNIP--] > > This heuristic tries hard to get interesting symbols first in the list. > > I know I am the one whose question triggered this work from you, but in > the end the "response" seems disproportionate. Having 5 different > ordering rules is a lot, and that's quite a bit of code, which while not > the most complex in the world, is still far from trivial and may require > maintenance work in the future. OK, I understand you concerns. Michal, please do not apply this patch: I'll rewrite it with a simpler heuristic as Jean suggested, and will re-submit a complete pull-request later tonight. > So I think it is more important to make it clearer that regular > expressions are allowed, than to come up with a brilliant sort order. I > know the help page says it, but the prompt itself only asks for a > "(sub)string" and it is not immediately obvious (to me at least) that > regular expressions are considered substrings. OK, Ill see to rewite the title of the dialog box. Thanks for the review! Regards, Yann E. MORIN.
diff --git a/scripts/kconfig/symbol.c b/scripts/kconfig/symbol.c index ab8f4c8..d08e300 100644 --- a/scripts/kconfig/symbol.c +++ b/scripts/kconfig/symbol.c @@ -954,38 +954,116 @@ const char *sym_escape_string_value(const char *in) return res; } +struct sym_match { + struct symbol *sym; + off_t so, eo; +}; + +/* Compare matched symbols as thus: + * - first, symbols with a prompt, + * - then, smallest offset, + * - then, shortest match, + * - then, highest relative match, + * - finally, alphabetical sort + */ +static int sym_rel_comp( const void *sym1, const void *sym2 ) +{ + struct sym_match *s1 = *(struct sym_match **)sym1; + struct sym_match *s2 = *(struct sym_match **)sym2; + struct property *p; + bool p1 = false, p2 = false; + int l1, l2, r1, r2; + + for_all_prompts(s1->sym, p) { + p1 = true; + } + for_all_prompts(s2->sym, p) { + p2 = true; + } + if (p1 && !p2) + return -1; + if (!p1 && p2) + return 1; + + if (s1->so < s2->so) + return -1; + if (s1->so > s2->so) + return 1; + + l1 = s1->eo - s1->so; + l2 = s2->eo - s2->so; + if (l1>l2) + return 1; + if (l1<l2) + return -1; + + r1 = 100*l1 / strlen(s1->sym->name); + r2 = 100*l2 / strlen(s2->sym->name); + if (r1 > r2) + return -1; + if (r1 < r2) + return 1; + + return strcmp(s1->sym->name, s2->sym->name); +} + struct symbol **sym_re_search(const char *pattern) { struct symbol *sym, **sym_arr = NULL; + struct sym_match **sym_match_arr = NULL; int i, cnt, size; regex_t re; + regmatch_t match[1]; cnt = size = 0; /* Skip if empty */ if (strlen(pattern) == 0) return NULL; - if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB|REG_ICASE)) + if (regcomp(&re, pattern, REG_EXTENDED|REG_ICASE)) return NULL; for_all_symbols(i, sym) { + struct sym_match *tmp_sym_match; if (sym->flags & SYMBOL_CONST || !sym->name) continue; - if (regexec(&re, sym->name, 0, NULL, 0)) + if (regexec(&re, sym->name, 1, match, 0)) continue; if (cnt + 1 >= size) { - void *tmp = sym_arr; + void *tmp; size += 16; - sym_arr = realloc(sym_arr, size * sizeof(struct symbol *)); - if (!sym_arr) { - free(tmp); - return NULL; + tmp = realloc(sym_match_arr, size * sizeof(struct sym_match *)); + if (!tmp) { + goto sym_re_search_free; } + sym_match_arr = tmp; } sym_calc_value(sym); - sym_arr[cnt++] = sym; + tmp_sym_match = (struct sym_match*)malloc(sizeof(struct sym_match)); + if (!tmp_sym_match) + goto sym_re_search_free; + tmp_sym_match->sym = sym; + /* As regexec return 0, we know we have a match, so + * we can use match[0].rm_[se]o without further checks + */ + tmp_sym_match->so = match[0].rm_so; + tmp_sym_match->eo = match[0].rm_eo; + sym_match_arr[cnt++] = tmp_sym_match; } - if (sym_arr) + if (sym_match_arr) { + qsort(sym_match_arr, cnt, sizeof(struct sym_match*), sym_rel_comp); + sym_arr = malloc((cnt+1) * sizeof(struct symbol)); + if (!sym_arr) + goto sym_re_search_free; + for (i = 0; i < cnt; i++) + sym_arr[i] = sym_match_arr[i]->sym; sym_arr[cnt] = NULL; + } +sym_re_search_free: + if (sym_match_arr) { + for (i = 0; i < cnt; i++) + free(sym_match_arr[i]); + free(sym_match_arr); + } regfree(&re); return sym_arr;