| Welcome to vBulletin FAQ |
vBulletin FAQ Navigation
Getting Started
Customizing your vBulletin
Search Engines & SEO
Making Money with a Forum
Promoting your Community
|
| Get your own vBulletin Today |
|
| Webmaster Help |
|

|
|
Programming Spiders, Bots, and Aggregators in Java
vBulletin Book Store > vBulletin books beginning with P
|
Programming Spiders, Bots, and Aggregators in Java |
Author: Jeff Heaton
Published: 2002-02 |
List price: $59.99
Our price: $114.34
|
As of: August 20th, 2008 09:06:22 AM
|
|
|
Customer comments on this selection.
Lots of working code but not much of a tutorial Bots are the simplest form of Internet-aware programs in that they simply carry out a repetitive task once unleashed on the web. A spider travels the web in a complex fashion, moving from one part of the World Wide Web to another collecting information from one site and then jumping to another based on that information. An aggregator is a bot that is designed to log into several user accounts and retrieve similar information.
If you need a complete bot, spider, or aggregator written in Java, complete with source code and a detailed manual about that source code so that you can customize it to suit your needs, this is a five star book. However, if you are looking for a book about information storage and retrieval and network programming that focuses on the theory of operation of such software with application code written in Java, you will be sorely disappointed.
The author did such a fine job of documenting his work with excellent diagrams, comments, and the book that reads like a user's manual, that I easily took his Web spider code and modified it to perform many additional tasks that his basic package does not do. All of the hooks are available in his code for you to modify it to collect or examine just about any kind of data accessible via the web.
I highly recommend this book if you are taking an information storage and retrieval class and you would like to read and study something applied on spiders, bots, and aggregators versus the theory you get in most textbooks. Just understand you are getting code plus a user's manual, not a tutorial. You are definitely going to need other resources on Java network programming if you want to study, understand, or modify the included source code. I suggest the latest edition of "Java Network Programming" by Elliotte Rusty Harold for help with the network programming part of bots, spiders, and aggregates. I also suggest you look at "Spidering Hacks", which has many good ideas of features you can add to your web spider.
Not much information for such a long book The essence of this book could probably have been compressed into a few chapters. I read the whole thing in about a day, skimming over many sections (e.g. the structure of HTML, including discussion of anchor tags) that I, like most programmers, already know well. I think I would have preferred a focussed tutorial on Heaton's Bot package instead of a detailed but boring treatment of every technology (however elementary) used in the process of constructing spiders and bots.Aside from this, Heaton is not a great writer. Attempting to be particularly organized and structured, he comes off as excessively stiff; I stopped counting the number of times he wrote "I will now show how to..." I purchased this book expecting the process of constructing a spider or bot to draw on a range of specialized skills, but it appears to be quite simple: basic knowledge of Java network programming (i.e. sockets), HTTP, HTML and XML parsing would appear to suffice. I'm sure there is all sorts of complex stuff Heaton does not talk about, but I wish he had! At the moment I'm wondering whether this book deserves a space on my finite bookshelf.
Create a Object Oriented Bot Package Step by Step I use this book as a supplement to a class that I teach, as it gives the students the necessary stills to programmatically spider, and generally access, information on the Net.As some of the other reviewers point out, this book does center around the creation of a "bot package". However, I see this as one of the book's greatest strengths. The author explains step by step how to take basic concepts, continually build upon them, progressing onward to more complex spiders and bots. Specifically: 1. Create an advanced HTTP object that overcomes many of the shortcomings of the one which is built into Java. (namely cookie support, referrer support, HTTP authentication, and more) 2. Add forms/page processing on top of the HTTP object. You are shown step by step how to process the data you collect from step 1. 3. Create a bot that wields the page/form processing created in step 2. 4. Create a spider, that, using steps 1-3, can access pages across an entire site. 5. Expand the spider to support thread pooling and a JDBC database. Rather than providing a bunch of disjoint code samples, like many books do. The author guides you step by step through the above path, revealing the techniques at every step. For the reader who does not care about the intricate nature of bot programming, sadly, some of my students. You can skip to the API documentation and get right onto creating your own bots. You can also download updated versions of the "bot package" from the author's site. I actually did this before buying the book. The downsides to the book are the example programs use of GUI's. I would rather every example had been straight console, the GUI only gets in the way, for a book targeting bot programming. Also the author very annoyingly putting an underscore in front of every class-instance variable, which gives some of the code something of a C++ look I suppose. If you are already programming bots and spiders of your own, I don't think you will get much more from this book than you are likely already doing. But for someone who wants to get started in this exciting area, there is nothing else like it, and I highly recommend it.
Misleading Title As another reviewer commented this book should be called using the com.heaton.bot package api reference. All you learn is how to use this package of java classes, not how to actually create spiders, bots or aggregators from the ground up. I feel the title is misleading for such an expensive book. The only way I will learn what I want is to read the authors source code - which btw is very ugly however functional.
happy Visual Cafe produces the Swing so one can view the examples from the book. So what? When beginning to program with HTTP protocols, it's easy to enter incorrect methods and parameters that lead to dead-ends and frustration. As I learn about and use the Heaton API, I am pleasantly surprised with the methods available and how easily they're implemented and that they lead to success. The source code is included on the CD with updated versions at the Heaton Website.
|
|
Our vBulletin book picks:
|
|
Find more vBulletin related products of interest.
|