Darwa
10-10-2006, 03:53 PM
Any idea why this thread is continually showing up in 'new posts' for me?
PS: Anyone else see this post at the top?
PS: Anyone else see this post at the top?
Thank you for visiting. This is our website archive. Please visit our main website by clicking the banner above. vBulletin FAQ is dedicated to helping the forum owner build, manage and profit from his vBulletin Forum vBulletin Web Hosting - Free skins and styles for your vBulletin - Search Engine Optimization |
|
|
|
|
RE: Google CodeSearch - Please Note!Darwa 10-10-2006, 03:53 PM Any idea why this thread is continually showing up in 'new posts' for me? PS: Anyone else see this post at the top? SEO Pirate 10-10-2006, 04:50 PM What would be a common example of vBulletin zip/tar.gz files likely to be found in a web-accessible location? Joeychgo 10-10-2006, 05:40 PM It has come to our attention that Google's new 'Code Search' system has been finding a number of vBulletin zip files uploaded on various customers' sites and publishing the script contents, including... More... (http://www.vbulletin.com/forum/showthread.php?t=204310&goto=newpost) minstrel 10-10-2006, 05:51 PM Yikes! To add to Joey's excerpt: ...including sensitive data such as customer and license numbers. Please, do not leave your vBulletin zip/tar.gz files or site backups in a web-accessible location! If you do not ensure that your vBulletin zips are protected, not only are you leaking your own personal data, but you will also be in violation of the vBulletin license agreement by allowing your copy of the software to be downloaded by unauthorized parties. minstrel 10-10-2006, 06:23 PM Because there was a problem with the forum clock this morning and the timestamps were all wrong - like 12 hours ahead. Hell³ 10-10-2006, 06:36 PM What would be a common example of vBulletin zip/tar.gz files likely to be found in a web-accessible location? When someone for whatever purpose uploads the distribution files to their web server. Or someone who creates a site backup with a cron job and leaves the backup file inside the public web folder. Anyone knows if the google code spider is tagged diferently than the regular google spider? minstrel 10-10-2006, 06:48 PM I believe since early this year Google's spiders have all been doing double duty - the so-called crawl-caching proxies: Matt Cutts: Google Crawl Caching Proxy (http://www.mattcutts.com/blog/crawl-caching-proxy/) Several people have noticed content from other Google bots showing up in our main web index, and are wondering… why/how does that happen? Last week I was at WebmasterWorld Boston and I talked about this issue there, but I’d like to do a blog post about Google’s crawl caching proxy, because some people have questions about it. First off, let me mention what a caching proxy is just to make sure that everyone’s aware. I’ll use an example from a different context: Internet Service Providers (ISPs) and users. When you surf around the web, you fetch pages via your ISP. Some ISPs cache web pages and then can serve that page to other users visiting the same page. For example, if user A requests www.cnn.com, an ISP can deliver that page to user A and cache that page. If user B requests www.cnn.com a second later, the ISP can return the cached page. Lots of ISPs and companies do this to save bandwidth. For example, Squid is one web proxy cache that is free and common that a lot of people have heard of. As part of the Bigdaddy infrastructure switchover, Google has been working on frameworks for smarter crawling, improved canonicalization, and better indexing. On the smarter crawling front, one of the things we’ve been working on is bandwidth reduction. For example, the pre-Bigdaddy webcrawl Googlebot with user-agent “Googlebot/2.1 (+http://www.google.com/bot.html)” would sometimes allow gzipped encoding. The newer Bigdaddy Googlebots with user-agent “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” are much more likely to support gzip encoding. That reduces Googlebot’s bandwidth usage for site owners and webmasters. From my conversations with the crawl/index team, it sounds like there’s a lot of head-room for webmasters to reduce their bandwith by turning on gzip encoding. Another way that Bigdaddy saves bandwidth for webmasters is by using a crawl caching proxy. ..... In this example, if the blogsearch crawl or AdSense wants to fetch a page that the web crawl already fetched, it can get it from the crawl caching proxy instead of fetching more pages. That could reduce the number of pages fetched down to as little as 11. In the same way, a page that was fetched for AdSense could be cached and then returned to if the web crawl requested it. So the crawl caching proxy work like this: if service X fetches a page, and then later service Y would have fetched the exact same page, Google will sometimes use the page from the caching proxy. Joining service X (AdSense, blogsearch, News crawl, any Google service that uses a bot) doesn’t queue up pages to be include in our main web index. Also, note that robots.txt rules still apply to each crawl service appropriately. If service X was allowed to fetch a page, but a robots.txt file prevents service Y from fetching the page, service Y wouldn’t get the page from the caching proxy. Finally, note that the crawl caching proxy is not the same thing as the cached page that you see when clicking on the “Cached” link by web results. Those cached pages are only updated when a new page is added to our index. It’s more accurate to think of the crawl caching proxy as a system that sits outside of webcrawl, and which can sometimes return pages without putting extra load on external sites. Just as always, participating in AdSense or being in our blogsearch doesn’t get you any “extra” crawling (or ranking) in our web index whatsoever. You don’t get any extra representation in our index, you don’t get crawled/indexed any faster by our webcrawl, and you don’t get any boost in ranking. This crawl caching proxy was deployed with Bigdaddy, but it was working so smoothly that I didn’t know it was live. That should tell you that this isn’t some sort of webspam cloak-check; the goal here is to reduce crawl bandwidth. | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum