PDA

View Full Version : robots.txt file question



Blackice
04-18-2006, 10:15 AM
I made a robots.txt file and uploaded it to my root folder.

here it is..


User-agent: *
Disallow: /admincp/
Disallow: /cgi-bin/
Disallow: /clientscript/
Disallow: /includes/
Disallow: /install/
Disallow: /modcp/
Disallow: /subscription.php
Disallow: /payments.php
Disallow: /faq.php
Disallow: /calendar.php
Disallow: /search.php
Disallow: /private.php
Disallow: /online.php
Disallow: /sendmessage.php
Disallow: /sendmessage.php?do=
Disallow: /reputation.php
Disallow: /report.php
Disallow: /threadrate.php
Disallow: /showpost.php
Disallow: /postings.php
Disallow: /newthread.php
Disallow: /newreply.php
Disallow: /register.php
Disallow: /login.php
Disallow: /faq.php
Disallow: /image.php
Disallow: /cron.php
Disallow: /joinrequests.php
Disallow: /usercp.php
Disallow: /plugins/

but if you type in site:hiphoprelated.com into google. my register.php and faq.php is listed. why is this?

I guess the main question is what exactly does the robots.txt file do? :p i was under the impression it asked search engines to please not crawl that page?

Joeychgo
04-18-2006, 12:07 PM
but if you type in site:hiphoprelated.com into google. my register.php and faq.php is listed. why is this?

I guess the main question is what exactly does the robots.txt file do? :p i was under the impression it asked search engines to please not crawl that page?

Your robots.txt is fine.

Thats exactly what it does. The robots.txt protocol is purely advisory. It relies on the cooperation of the web robot, so that marking an area of your site out of bounds with robots.txt does not guarantee privacy. Many web site administrators have been caught trying to use the robots file to make private parts of a website invisible to the rest of the world. However, the file is necessarily publicly available and is easily checked by anyone with a web browser.


From the looks of it, those entries are on their way out of the google index. Thats why they ahve no cached tab or description. They were probably already indexed when you added the robots.txt. They'll be removed from the index comepletely before long is my guess.

Blackice
04-18-2006, 05:34 PM
Thanks for clearing that up Joey :D

Noppid
04-19-2006, 06:20 AM
Remember, that robots.txt will stop the adsense spider too. ;)

Brandon Sheley
04-19-2006, 10:43 PM
Remember, that robots.txt will stop the adsense spider too. ;)

ah, I didn't think of that.. thx for bringing it up noppid. I'll have to rethink my robot.txt

Blackice
04-20-2006, 01:04 PM
ya me too, thanks for bringing that up

GrendelKhan{TSU}
07-18-2006, 11:27 AM
setting up your htaccess file will provide better control for your robots.txt file and allow for adsense to properly crawl needed content.

lots of settings and way around all of this...takes some doing. search for bazillion topics and info on this and you'll come with what you need tweek. a lot depends on your setup as well. If you still have questions later... I'm sure joey ca help or I will if get time (multi-tasking 98792834723 different things right now though :()