Thursday, June 02, 2005

Google Suggest - Racism in the Machine

Just 20 years ago it would of taken an enormous amount of effort to
duplicate the functionality of Google’s Sets (see previous few posts on a how to). Just look at projects like http://www.cyc.com and WordNet from Princeton. These systems were originally built via supervised learning and screened training data. In contrast with Google they feel incomplete. Google Sets and similar algorithms sprouting up based on unsupervised learning with the generic web as a training corpus, seem to bring new life in the field of Artificial Intelligence. However there is always a cost.

Google, Yahoo and even Microsoft with all there resources try to develop
filtering mechanisms, but can they really expect to filter out the the very
essence of the web.

What is this essence? You only have to answer a few simple questions. What percentage of the web is related to sex, crime, slander, and racism? What percentage of blogs are bias, and rant based?

Exploiting these techniques, inevitable will create AI's not better than us,
but more deviant and racist. Maybe the movies had it right all along.

Skeptical? Just look at Google Suggest for a glimpse in our dismal future.

Below are screen captures of Google's Suggest feature. It expands your
keywords based on result counts discovered while spidering. Click on the image to enlarge

blacks are..




whites are..




jews are..




germans are..




greeks are..




chinese are..


Its not my intension to bring grief to Google. Google is not to blame. They are only the medium we are the message.

What's the point of this warning. Maybe the simplest road is just not worth taking.. The cost is just too great, in the end.

Look at what google says about itself with google is ...

16 comments:

c50c said...

Interesting, but what I always wonder is what is the world obsession with gay, e.g. if you put almost any name into google suggest, what do you get...

name is gay!!!

Where does google find all these sites that list people as gay?!

questsin said...

I'll be posting an article similar to "Revese Engineering Google Sets" on "Revese Engineering Google Suggest".

This algorithm doesn't require complete phrases but builds them as it goes. However it will handle complete phrases as well.

cya

questsin said...

Using Google Search

default operator is and
* matches any word
"" treat the phrase as a word
-word do not include the word, but if the word doesn’t exist the cache is skipped

Try
god is gay - Results 1 - 10 of about 6,790,000 for god is gay. (0.11 seconds)
god is gay -questsin - Results 1 - 10 of about 6,800,000 for god is gay -questsin. (0.14 seconds)
cool how if we minus a word that doesn’t exist, we get more results

"god" - Results 1 - 10 of about 149,000,000 for "god" [definition]. (0.08 seconds)
"is" - Results 1 - 10 of about 4,100,000,000 for is. (0.10 seconds)
"gay" - Results 1 - 10 of about 81,600,000 for gay [definition]. (0.12 seconds)


"god is" - Results 1 - 10 of about 5,050,000 for "god is". (0.06 seconds)
"god is" -questsin - Results 1 - 10 of about 5,050,000 for "god is" -questsin. (0.07 seconds)
"god is *" - Results 1 - 10 of about 5,050,000 for "god is *". (0.28 seconds)

"god is gay" - Results 1 - 10 of about 6,250 for "god is gay". (0.22 seconds)
"god is" -gay - Results 1 - 10 of about 4,450,000 for "god is" -gay. (0.11 seconds)


"is gay" - Results 1 - 10 of about 620,000 for "is gay". (0.26 seconds)
"god * gay" - Results 1 - 10 of about 47,900 for "god * gay". (0.20 seconds)
"* * gay" - Results 1 - 10 of about 41,300,000 for "* * gay". (0.28 seconds)


Using Google Suggest
Try
god - 129,000,000 results.
god is = 109,000,000 results (summed since 'god is' not in list by itself)
god is gay - 8,098,000 results.
is = 29,000,000 results (summed since 'is' not in list by itself)
is gay - 123,000,000 results.


(Google Suggest of "god")/(Google Search of of "god") = 129,000,000/149,000,000 = ~0.86
Why isnt this equal
1. Google Suggest is Stale since its in the lab


(Google Suggest of "god is gay")/(Google Search of of "god is gay") = 8,098,000/6,250 = 1295.68 times more
Even if "Google Suggest's" data is stale, we would expect less results not more!!!!
Where is the rest of the results.. Google must be synthesizing


results('god is')/results('god') = 0.033892617449664429530201342281879 = then chance "is comes after god"

results('is gay')/results('is') = 0.00015121951219512195121951219512195 = the chance "gay comes after is"

r(god is)/r(god)*r(is)/r(is gay) = 5.1252250777541332460304468815171e-6 = ~1/195113 = the combined chance

paul said...

Try
"The english are" in suggest.

Apparatnly they are "so nice" :)

questsin said...

Google Suggest should try adding a ranking system for each suggestion. Similar to the Google Vote for a site in the toolbox. If many users rank a suggestion negativley it would remove the suggestion, and vise vera.

questsin said...

I've been trying to find my blogs on Google's new blogsearch.google.com. (good job my the way)

I could find them all except* this one.

Hmm.. I wonder why ;)

It might have something to do with Google's fight against trademark and infrigment against there own brand identity..

But blogger.com is owned and operated by Google, and it is a forum/community site devoted to hype.. seems almost defeating the purpose.. If no one could talk about products and services.. what the point of a blogging service site :)

Or maybe its there embarasment.. No ones perfect!

questsin said...

I'm considering pulling this blog off completely.. Don't want to get on Googles bad side.. I'm not big enough like C|net

themaxx.ca said...

You're too easily offended!
I don't see anything to make a big deal off...

JalenJade said...

Arn't results like this why google wants to stop indexing blogs? I bet that's the main reason for results like those.

Anonymous said...

You use IE? Shame on you

Anonymous said...

Well, according to Google Suggest, Google is: Gay, evil, shit and crap. I think Google suggest likes colourful words.

Edward Clarke said...

Google Suggest is working by logged searchable phrases and not results. It's a great means of looking into the searching popularity and the phrases being used.

Shame stereotypes are reinforced this way though.

In terms of countries has anyone noticed a distinct bias to the West?

Anonymous said...

OHHHH PLEASE!!!!!

If you type in phrases that generalise a race of people of course your going to get racist results!

For all the people here would would not consider themselves racist, when was the last time that you used a sentence that contained one of the following phrases:

"Blacks are "
"Whites are "
"Jews are "
"Germans are "
"Greeks are "
"Chinese are "

Not to mention "Blacks", "Whites", "Jews"???.... All racist terms, I bet you would have been surprised if "Niggers are " had come up with something racist. *rolls eyes*

Ohh here's a result page I just found using your technique that you might be able to relate to http://www.google.com/search?complete=1&hl=en&q=bloggers+are+idiots&btnG=Google+Search

Gee got any more FUD wisdom to share?

Anonymous said...

Strange how google allows all these potentially discriminatory phrases including some primarily racist terms not mentioned here, yet it kills suggestions for transsexual (or transsexuality) as soon as you type the x.

Obviously they are censoring some phrases. Last time I looked, transsexuality was considered a medical disorder. It also kills suggestions for asexual, homosexual, bisexual and lesbian, but not gay, and intersex (another medical condition). Now, wouldn't this block sites to support sites for asexuals, bisexuals, lesbians, transsexuals and intersex people? In fact, they seem to be blocking any word that has sex in it. If they are blocking any suggestions for adult sites, they aren't doing a very good job as I found several suggestions for searches for these sites by using other phrases.

I hope this isn't the future of Google, otherwise many people (including myself) will start looking elsewhere for their searches. This needs to be fixed fast.

Anonymous said...

Ohh bloody hell it's kills it because of the word "sex", nothing to do with transsexual or transsexuality.

6th one down: http://www.google.com/search?complete=1&hl=en&q=people+are+&btnG=Google+Search

wow power leveling said...

Catch the wow gold star that holds your gold in wow destiny,cheap wow gold the one that forever maplestory money twinkles within your heart. Take advantage of precious opportunities while they still sparkle before you. Always believe that your buy maplestory mesos ultimate goal is attainable cheap mesos as long as you commit yourself to it.maple money Though barriers may sometimes stand in the way of your dreams, remember that your destiny is hiding behind them.wow gold kaufen Accept the fact that not everyone is going to approve of the choices Maple Story Accounts you've made. Have faith in your judgment.wow gold farmen Catch the star that maple story money twinkles in your heart and it will lead you to your destiny's path. Follow that pathway and uncover the sweet sunrises that await you. Take pride in your accomplishments, as they are stepping stones to your dreams. Understand that you may make mistakes, powerlevelbut don't let them discourage you.ms mesos Value your capabilities and talents for they are what make you truly unique. The greatest gifts in life are not purchased, but acquired through hard work and determination.maplestory mesos Find the star that twinkles in your heart?for you alone maplestory powerleveling are capable of making your brightest dreams come true. Give your hopes everything you've got and you will catch the star that holds your destiny.