The Cellar  

Go Back   The Cellar > Main > The Internet
FAQ Community Calendar Today's Posts Search

The Internet Web sites, web development, email, chat, bandwidth, the net and society

Reply
 
Thread Tools Display Modes
Old 05-27-2005, 03:40 PM   #1
Gwennie!
Not Female at Birth
 
Join Date: Dec 2003
Location: Anaheim
Posts: 166
Wolf and Gwennie: attached at the hips?

http://search.msn.com/results.aspx?F...pe=0&q=Gwennie

The link to Wolf's profile doesn't even point to the right page. Her profile ranks higher than mine in a search for Gwennie? This looks like a bug in the brand spanking new search engine at MSN.
__________________
Only The Crumbliest Flakiest Gwennie!
Gwennie! is offline   Reply With Quote
Old 05-27-2005, 03:57 PM   #2
perth
Strong Silent Type
 
Join Date: Mar 2002
Location: Fort Collins, CO
Posts: 1,949
Hrm. We've see this before, haven't we? It changes based on the last poster.
perth is offline   Reply With Quote
Old 05-27-2005, 04:37 PM   #3
lookout123
changed his status to single
 
Join Date: Apr 2004
Location: Right behind you. No, the other side.
Posts: 10,308
yeah, it is something like that perth.

for awhile (maybe still) if i searched for lookout123 it came back with links to the bosque and my title as "closet democrat". i can't find that post anywhere, but it shows up on google.
__________________
Getting knocked down is no sin, it's not getting back up that's the sin
lookout123 is offline   Reply With Quote
Old 05-27-2005, 06:21 PM   #4
Gwennie!
Not Female at Birth
 
Join Date: Dec 2003
Location: Anaheim
Posts: 166
Quote:
Originally Posted by perth
Hrm. We've see this before, haven't we? It changes based on the last poster.
Yea, that's the problem with generic spidering of executable URLs. But, what is weird in this case is that there is a version skew between the words that are in the search index and the content displayed in the results page. It's indexed with "Gwennie", displayed with "Wolf", and linked to the last poster.

I met with the search team at MSN last year; they're all focused on low level details like having the web crawler make direct calls to WinSock and tuning the C code. Yet, they have these serious algorithmic problems with URL normalization and index/display version skews.

When I interviewed there it was like Oil & Water, we didn't mix at all. I do high level programming in Java, Tomcat, and Linux. That wasn't popular with them. The only reason I went up there was because their recruiters called me.

Glad to see they've launched MSN Search and show us how 'great' it is.
__________________
Only The Crumbliest Flakiest Gwennie!
Gwennie! is offline   Reply With Quote
Old 05-28-2005, 04:55 PM   #5
Gwennie!
Not Female at Birth
 
Join Date: Dec 2003
Location: Anaheim
Posts: 166
I've been thinking about how this bug could arise. They're spidering/indexing process would have to be fairly screwed up for a version skew of the content.

It is clear to me now that they index the text of the link to the page along with the content of the page. The spider grabs a forum index page with a link to last poster of "Gwennie", by the time the spider follows the link it grabs Wolf's profile page and associates the link text with the wrong web page. Since Wolf's profile page out scored the Gwennie! profile page, it is apparent that they put too much weight on the link text. They have plenty of tuning to do.

Those of us that have been search engineers before the Internet came along don't use these crutches like these newbies. Pure linguistic algorithms don't use link information, but rather find documents that are related because of phrases that they share.

Sorry, folks, I'm just thinking out loud here and I thought their may be some interest in Search in The Internet forum.
__________________
Only The Crumbliest Flakiest Gwennie!
Gwennie! is offline   Reply With Quote
Old 05-28-2005, 06:10 PM   #6
wolf
lobber of scimitars
 
Join Date: Jul 2001
Location: Phila Burbs
Posts: 20,774
So, how do we get to be #1 for Whale Penis? What's the best strategy?
__________________
wolf eht htiw og

"Conspiracies are the norm, not the exception." --G. Edward Griffin The Creature from Jekyll Island

High Priestess of the Church of the Whale Penis
wolf is offline   Reply With Quote
Old 05-29-2005, 01:44 AM   #7
Gwennie!
Not Female at Birth
 
Join Date: Dec 2003
Location: Anaheim
Posts: 166
Quote:
Originally Posted by wolf
I'm assuming y'all are trying to get the page http://whalepenis.org ranked highly. The keyword density for that page needs to be higher.

If the keyword density is too low, it's relevance score for the keyword will be lower. But, if it is too high such as "Whale Penis, Whale Penis, Whale Penis, Whale Penis, Whale Penis, Whale Penis", that's artificially high and will be rejected. Work the keywords of interest into the web page content as much as possible. Replacing pronouns and repeating the keywords in normal language is the best.

Follow the keyword density of The Cellar Profile pages.

The Cellar Go Back The Cellar > View Profile Reload this Page Gwennie!
User CP Register FAQ Members List Calendar New Posts Search Quick Links Log Out
Search Forums Advanced Search Quick Links New Posts Mark Forums Read
Open Buddy List User Control Panel Edit Signature Edit Profile Edit
Options Miscellaneous Private Messages Subscribed Threads My Profile
Who's Online View Profile: Gwennie!
Gwennie! I'm Just a Gwannabe Gwennie!'s picture Offline
Add Gwennie! to Your Buddy List Add Gwennie! to Your Ignore List
Signature: Only The Crumbliest Flakiest Gwennie!
Forum Info Contact Info Join Date: 12-13-2003 Posts Total Posts: 125 (0.23 posts per day)
Find all posts by Gwennie! Find all threads started by Gwennie!
Home Page: http://tragickingdom.net/
Email: Send a message via email to Gwennie!
Private Message: Send a private message to Gwennie!
Additional Information Group Memberships
Birthday: October 3, 1969 Biography: Location: Anaheim Interests: Occupation: software engineer
Gwennie! is not a member of any public groups


Here's a rewrite of the page that should rank it higher.

Whale Penis: Who We Are

The Church of the Whale Penis is a group of people that merely want to be the number one google site for "Whale Penis." Yes, we're serious. No, really.

How serious?

Well, we own the Whale Penis domain whalepenis.org. And there's this Whale Penis site. And The Church of the Whale Penis is giving out free whalepenis.org e-mail forwarders. Whale Penis friends, how's that for serious?

What is this Whale Penis stuff? Some porn BS?

No! Whale Penis not about porn at all! Seriously. See "Whale Penis: Who We Are" above about The Church of the Whale Penis.

For the Whale Penis site, what is the current ranking on Google in searches for "Whale Penis"?

This Whale Penis website is #228 as of May 19, 2005 in searches for "Whale Penis".

Alright...how do I help make the Whale Penis rise?

Link to us! Friends of the Whale Penis, once you link to us, e-mail us at link(@)whalepenis.org (remove the parentheses around the at sign), and we'll link to you! And tell your family and friends about us!

Links related to this Whale Penis site

The Bosque
The Cellar Image of the Day
__________________
Only The Crumbliest Flakiest Gwennie!

Last edited by Gwennie!; 05-29-2005 at 01:46 AM.
Gwennie! is offline   Reply With Quote
Old 05-29-2005, 01:52 AM   #8
Gwennie!
Not Female at Birth
 
Join Date: Dec 2003
Location: Anaheim
Posts: 166
Two more things. The posts in the thread on El Ciberbosque should have links to http://whalepenis.org/

If you want Google to visit that page more often put Google AdSense on the bottom of the page. They spider AdSense pages more often than others. You could also put AdSense on El Ciberbosque to get those pages spidered more often.

Church of the Whale Penis
__________________
Only The Crumbliest Flakiest Gwennie!
Gwennie! is offline   Reply With Quote
Old 05-29-2005, 02:09 AM   #9
Gwennie!
Not Female at Birth
 
Join Date: Dec 2003
Location: Anaheim
Posts: 166
Wow, MSN re-spidered these pages today. Now I'm attached to Troubleshooter.
__________________
Only The Crumbliest Flakiest Gwennie!
Gwennie! is offline   Reply With Quote
Old 05-29-2005, 05:08 AM   #10
Undertoad
Radical Centrist
 
Join Date: Jan 2001
Location: Cottage of Prussia
Posts: 31,423
Google page rank gives attention to sites that have heavy inbound linking so the first thing is to get a lot of sites to link to the site -- with the keywords.

Google assigns more priority to sites that change regularly so the next thing to do is to put dynamic content on the page.
Undertoad is offline   Reply With Quote
Old 05-29-2005, 08:42 AM   #11
wolf
lobber of scimitars
 
Join Date: Jul 2001
Location: Phila Burbs
Posts: 20,774
Actually, syc's goal is to get the infamous "Whale Penis Thread" to be number one for whale penis, but would probably be satisfied with getting the church's webpage up there as well. Thanks for the analysis!
__________________
wolf eht htiw og

"Conspiracies are the norm, not the exception." --G. Edward Griffin The Creature from Jekyll Island

High Priestess of the Church of the Whale Penis
wolf is offline   Reply With Quote
Old 05-31-2005, 01:20 AM   #12
Gwennie!
Not Female at Birth
 
Join Date: Dec 2003
Location: Anaheim
Posts: 166
Quote:
Originally Posted by Undertoad
Google page rank gives attention to sites that have heavy inbound linking so the first thing is to get a lot of sites to link to the site -- with the keywords.
This is true, but it's more like a tiebreaker. Relevance scores are a function of the keyword density. Many pages will have similar relevance scores, then Page Rank boosts linked-to pages. CoWP already has a linking program, but the home page text needed to be edited for search engine scoring.

Quote:
Originally Posted by Undertoad
Google assigns more priority to sites that change regularly so the next thing to do is to put dynamic content on the page.
This is true up to a certain point. Google cites news pages and airline schedules as examples that change too frequently to index in the web-search index.

Quote:
Originally Posted by Wolf
Actually, syc's goal is to get the infamous "Whale Penis Thread" to be number one for whale penis, but would probably be satisfied with getting the church's webpage up there as well. Thanks for the analysis!
You're welcome. I'm your friend in the search business.

The CoWP states search ranking as it's goal. So that's why I focused on that site.

The default settings for the Guest user is 20, so a search index will score each page of 20 posts. When you get past 20 posts, the relevance score of the first page of the thread won't change. You are better off starting a new thread filling it with 20 posts with good keyword density and linking to the page.

Adding new pages to a thread won't change the search ranking of previous pages of that thread.
__________________
Only The Crumbliest Flakiest Gwennie!
Gwennie! is offline   Reply With Quote
Old 05-31-2005, 07:41 AM   #13
Troubleshooter
The urban Jane Goodall
 
Join Date: Jan 2004
Location: Florida
Posts: 3,012
Quote:
Originally Posted by Gwennie!
Wow, MSN re-spidered these pages today. Now I'm attached to Troubleshooter.
Not anywhere that would require a BCS as far as I can tell.
__________________
I have gained this from philosophy: that I do without being commanded what others do only from fear of the law. - Aristotle
Troubleshooter is offline   Reply With Quote
Old 06-21-2005, 05:57 PM   #14
elSicomoro
Person who doesn't update the user title
 
Join Date: Jan 2001
Posts: 12,486
Finally...updated the COTWP webpage. Thanks for the help, RS!
elSicomoro is offline   Reply With Quote
Old 06-22-2005, 05:08 AM   #15
jaguar
whig
 
Join Date: Apr 2001
Posts: 5,075
I seem to remember some sites had to block the old MSNBot because it saw each sessionID as a unique URL.....
__________________
Good friends, good books and a sleepy conscience: this is the ideal life.
- Twain
jaguar is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT -5. The time now is 03:43 PM.


Powered by: vBulletin Version 3.8.1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.