Free Republic
Browse · Search
News/Activism
Topics · Post Article

To: Stultis
As a suggestion, I wonder if it would be possible to have some kind of "find related topics" function associated with keyword searches. That is a function that would bring up a list of keywords most commonly used in conjunction with the one the user is searching for.

What if you were to automatically generate keywords? You could take the article text, run it through an automatic summary generator, filter out really common words from the resulting summary (I, you, he, him, they, your, we, a, an, the, and, or, that, his, my, her, et cetera, et cetera), and then index whatever's left as a set of keywords for the article. Add any user-defined keywords that don't already appear in the automatically generated list, and you ought to wind up with a fairly extensive index of articles. Allow Boolean searches of the keyword-index, and you should get something that's very much like a full-text search, and nearly as powerful, except that you don't actually have to parse the full article during a search to do it - you just have to parse it once, during the summary generation. It wouldn't be a 100% solution (summarizers can require a fair amount of tweaking, and they're not absolutely perfect in any case), but it strikes me as a valid possibility...

94 posted on 11/25/2003 7:17:54 AM PST by general_re (Take away the elements in order of apparent non-importance.)
[ Post Reply | Private Reply | To 64 | View Replies ]


To: general_re; John Robinson
What if you were to automatically generate keywords? You could take the article text, run it through an automatic summary generator, filter out really common words from the resulting summary (I, you, he, him, they, your, we, a, an, the, and, or, that, his, my, her, et cetera, et cetera), and then index whatever's left as a set of keywords for the article.

I don't know that much about database programs in general, and less about Perl in particular, but I wouldn't be surprised if that capability was built in (or readily available in library functions or add ons). I don't see any reason to make such an index visible, however, even if it still produced matches in association with keyword searches. I like the idea of keeping keywords that users consciously select separate.

How about a "add more keywords" button on the article post page? The user could supply his own keywords (or not) and then click the button. The function would then create a page listing suggested keywords that could be added with a check box (like with the topics list). Or maybe this "add more keywords" page should come up after the user clicks the post button, in which case it might be called "add/review keywords".

The suggested keywords could be generated from a combination of the following:

  1. Highly unique words in the article
  2. Fuzzy or soundalike matching with user supplied keywords (which would help to catch misspellings and/or provide for variant spellings of words like "al Qaeda")
  3. keywords commonly associated elsewhere in the database with user supplied keywords and unique words from the article

101 posted on 11/25/2003 8:10:03 AM PST by Stultis
[ Post Reply | Private Reply | To 94 | View Replies ]

Free Republic
Browse · Search
News/Activism
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson