Weird word filtering in search (bug?) Topic is solved

General questions regarding the use of languages in Joomla! 3.x.

Moderator: General Support Moderators

Forum rules
Forum Rules
Absolute Beginner's Guide to Joomla! <-- please read before posting, this means YOU.
Forum Post Assistant - If you are serious about wanting help, you will use this tool to help you post.
Windows Defender SmartScreen Issues <-- please read this if using Windows 10.
Locked
froehli
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 107
Joined: Thu Jun 18, 2009 5:03 pm

Weird word filtering in search (bug?)

Post by froehli » Tue Oct 05, 2021 6:13 pm

I just noticed that Joomla seems to apply a strange word filtering in its search component.
I would understand a filtering of simple stop words (such as "and", "of", etc.) - which indeed seem to be filtered. But in addition, several other words are filtered - and not really consistently; sometimes the submitted text doesn't even seem to make it to the search providers (see below).

In english, the filtering seems less harsh - at least on my quick tests now; my installation is in german though, and there, the words for digits for example are filtered - such as "sieben" (seven) - that is, if I enter "sieben Schwerter" (seven swords), the search will only be for "Schwerter".

I'm actually writing my own search plugin; and with that I noticed that search requests for text which are apparently fully filtered, such as for "Dinge" (things), or "sieben dinge" (seven things), result in my search plugin receiving an empty text field. One difference in the result of such empty queries is that my search result form in that case doesn't even show the "found xx results" text. In contrast, it shows "found 0 results" when it did search for some "proper" words, but there just weren't any actual results.

Is that a known bug? Or some weird strange stop word filtering (which to me would feel extremely bad usability - I would at least expect some feedback, something like a message to the user telling him that there is filtering going on). Or is nobody using the on-page search anymore anyways since google is much better at searching pages?
Last edited by imanickam on Wed Oct 06, 2021 6:54 am, edited 1 time in total.
Reason: Moved topic » from General Questions/New to Joomla! 3.x to Language - Joomla! 3.x

User avatar
imanickam
Joomla! Master
Joomla! Master
Posts: 28202
Joined: Wed Aug 13, 2008 2:57 am
Location: Chennai, India

Re: Weird word filtering in search (bug?)

Post by imanickam » Wed Oct 06, 2021 3:55 am

Please look into the file com_finder.commonwords.txt that is located in the directory language\de-DE.

This file contains all the common words that would be filtered out from searching. Being a text file, you could edit this file. However, be informed that this file may be overwritten by future releases of the corresponding language pack. So, make sure to take a copy of the edited file so that you could incorporate the changes you have made in the newly overwritten file.

Note:
Assuming that you are using the German version of the language. If not, use the appropriate language code in the place of de-DE.
Ilagnayeru (MIG) Manickam | இளஞாயிறு மாணிக்கம்
Joomla! - Global Moderators Team | Joomla! Core - Tamil (தமிழ்) Translation Team Coordinator
Former Joomla! Translations Coordination Team Lead
Eegan - Support the poor and underprivileged

froehli
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 107
Joined: Thu Jun 18, 2009 5:03 pm

Re: Weird word filtering in search (bug?)

Post by froehli » Wed Oct 06, 2021 5:16 am

Thanks for the input! I thought I had searched for the stop words before, but apparently my search wasn't thorough enough.

I couldn't find the com_finder.commonwords.txt file, but searched around a bit, and found the ...localise.php scripts (probably the way the list works has changed in some recent Joomla version?)

The german file (language/de-DE/de-DE.localise.php) returns an extensive list of search words via `getIgnoredSearchWords`. The list returned by the english version (language/en-GB/en-GB.localise.php) contains only three words: and, in, on.

To me this stop word list is overly excessive and is actively working towards confusing users - if they search for some word and no result turns up, and there's not even a hint about what's going on, that's the worst user experience there can be - they will say the search does not work.

Is there maybe some way to report this to the german translation providers?

Edit: Just found an FAQ entry on the jgerman site, where they say, they filter 996 words which have "no relevance to the actual content of entries... that such a high number of words should be "not relevant" appears highly questionable to me.

Edit2: The language source on github has the com_finder.commonwords.txt (from which the localise.php is probably built during deploying...)?
I have created Issue on the excessive stop word filtering for the german translation.

User avatar
imanickam
Joomla! Master
Joomla! Master
Posts: 28202
Joined: Wed Aug 13, 2008 2:57 am
Location: Chennai, India

Re: Weird word filtering in search (bug?)

Post by imanickam » Wed Oct 06, 2021 6:53 am

Please accept my apologies for my mistake. Somehow, I thought that I am answering the question in the Joomla! 4 forum and hence my answer.

Glad to know that you have found what you wanted in the file localise.php and trying to rectify the issue with the German translation team.
Ilagnayeru (MIG) Manickam | இளஞாயிறு மாணிக்கம்
Joomla! - Global Moderators Team | Joomla! Core - Tamil (தமிழ்) Translation Team Coordinator
Former Joomla! Translations Coordination Team Lead
Eegan - Support the poor and underprivileged


Locked

Return to “Language - Joomla! 3.x”