In Search for Answers (Another Algorithm for Generic Question Answering)
There is always multiple ways of solving the same problem. Its only fitting I propose another angle for question answering. This approach is the more traditional of the two. However, I do provide a few twists and potential improvements toward the side of complete machine automation and tuning.
The Assumptions
1. Answers will appear in text, containing 80% of the original question, and/or vise versa.
2. The variations can be attributed to order, tense, spelling, form variations, synonyms etc of the words
3. We can get extra information around the type of answer expected by examining the inclusion of special words:
who, what, when, where, why, how etc.
Is this how BrainBoost.com works? Examine for yourself with "Who is the father of data warehousing"
The Strategy
1. Search via a search engine for permutations and variations in the question. Try using the Gigablast, yahoo or Google API. Hence "Searching for Answers"
2. Score the snippets and return the results. Assume that a snippet contains most of the terms and is within 500 characters of each other. Multiple answers having the same keyword proximities (see previous few post) can be assumed accurate and verified.
3. Remember the essence of the transformations required to find the right answer, to possibly incorporate it in future searches for answers*.
Let's walk through an example
Question:
How old is George Bush?
Can also be rewritten many ways as a question. Some possible variations is
1. Do you know how old George Bush is?
2. When was George Bush born?
3. What age is George Bush?
4. What is George Bush's age?
5. George Bush's age?
6. What is George Bush's birthday?
7. What is George Bush's date of birth?
8. When was George Bush's born?
...
In this context
(old, age, birth date, date of birth, born) are all inter-related properties
(is) can be ignored, as it is very popular
(When) is asking for an time based answer . say "date.*"
A possible answers could of appeared as
1. George Walker BUSH was born on 6 Jul 1946
2. Mr Bush was born July 6 , 1946
3. George Bush - George Bush Born: June 12, 1924
Notice some answers are correlating and some are contradicting.
Essence of the transformations
1. * * * was born on date.*
2. Mr * was born date.*
3. * * - * * Born: date.*
The Algorithm
1. Using all the criteria from the query expand each word into its sets (see Sets Algorithm).
2. Try searching for every combination, including leaving out words.
3. Gather all the retuned snippets and rank for quality
4. Remember the essence of the transformations
Both approaches to question answering together can increase the odds of finding the right answers or recognize the right answer among a collection of possible right answers.
It definitely doesn't end here.

0 comments:
Post a Comment