feature with wide ranging applications. The only thing that might be more interesting is the possible algorithm behind it.
The secrets are revealed in the properties
- Google sets aren’t ordered
- The sets are generic
- Extremely large sample data similar to numbers and alphabet characters appear less logical, implying too many samples
Here’s an algorithm of how it could potentially be recreated
- Scan and parse the web for simple html 'tables', and/or 'lists', filter using textual content only. This can be done by leveraging simple spidering techniques widely employed by search engines.
- Break up each column into fields and store each field as a record, including a set id uniquely identifying the column.
- Break up each row into fields and store each field as a record, including a set id uniquely identifying the row.
- repeat continuously for different tables on different sites
- Store the results for later use in a database table similar to
CREATE TABLE sets (
field_id int NOT NULL,
set_id int NOT NULL,
field varchar (255) NOT NULL
)
For Example
If we had 2 tables to pares similar to
- Nick,Male
- John,Male
and
- Product,Cost
- Orange,5
- Banana,6
The sets dataset table would be
- field_id,set_id,Field
- 1,1,Nick
- 2,1,John
- 3,2,Male
- 4,2,Male
- 5,3,Nick
- 6,3,Male
- 7,4,John
- 8,4,Male
- 9,5,Product
- 10,5,Orange
- 11,5,Banana
- 12,6,Cost
- 13,6,5
- 14,6,6
- 15,7,Product
- 16,7,Cost
- 17,8,Orange
- 18,8,5
- 19,9,Banana
- 120,9,6
Notice
- Header records are not treaded any differently
Getting the Sets
The Sybase ASE query that can be used to generate the desired set is then:
Select top 15
field, count(set_id)
From
sets
where
set_id in ( select set_id from sets where field in ("apple","Orange") )
group by
field
order by
count(set_id) DESC
Comments
The above SQL will generate sets whose members are centered on “apples” and “oranges”. We use the top 15 option to limit the results to only the top 15. If we were to take the entire list, you would see it get less and less accurate as the count goes to 1. The elegance is in the count function, and how it clusters popular results together.
Tips
- To generate closer matched sets, favor scanning the web pages containing each field discovered.
- To get results with similar accuracy to Google Sets, Try to collect about 2 million fields
Enhancements
- Ordering the results by the average field_id
- Adding the ability to subtract sets by removing set_id’s
Other Uses for similar algorithm
This algorithm could also be used to create sets of commonly related keywords by scanning web pages Meta keyword tags, Knowing that most of the keywords will be related to each other some how
Check it out first hand at questsin.net!!!

4 comments:
This explanation is too basic. Do a set query for "patch adams" and "mrs doubtfire" and you'll get a list of movies Robin Williams starred in. This accuracy could only be achieved by deriving a more relative association with the titles of the movies and the actors that played in them rather than simply "it's a movie title", "here's the name" that parsing a table would achieve. You have the basics, but there's much more too the relationships than simply parsing a table.
EVEN by wow gold the standards gold in wow of the worst financial buy wow gold crisis for at least wow gold cheap a generation, the events of Sunday September 14th and the day before were extraordinary. The weekend began with hopes that a deal could be struck,maplestory mesos with or without government backing, to save Lehman Brothers, America''s fourth-largest investment bank.sell wow gold Early Monday buy maplestory mesos morning Lehman maplestory money filed for Chapter 11 bankruptcy protection. It has more than maplestory power leveling $613 billion of debt.Other vulnerable financial giants scrambled maple money to sell themselves or raise enough capital to stave off a similar fate. billig wow gold Merrill Lynch, the third-biggest investment bank, sold itself to Bank of America (BofA), an erstwhile Lehman suitor,wow power leveling in a $50 billion all-stock deal.wow power leveling American International Group (AIG) brought forward a potentially life-saving overhaul and went maple story powerleveling cap-in-hand to the Federal Reserve. But its shares also slumped on Monday.
I work in reverse engineering myself, and l like what you have to say.
I am a seo consultant and l run my own seo company
Post a Comment