Web Crawler in Java – Darcy Ripper

Web Crawler in Java – Darcy Ripper

1 Overview
1.1 About
Darcy Ripper is a powerful web crawler in java (web spider)with great work load and
speed capabilities. This is a standalone multi-platform Graphical User Interfaceapplication that can be
used by simple users as well as programmers to download web resourceson the fly.
Based on proven Java technology, the intuitive Darcy GUI is easy-to-use and provides robust
functionality for creating and running simple or complex download jobs.
1.2 Features and Benefits
Darcy Ripper offers a large list of features that will enhancethe efficiency of the download process as far
as the processing time, network time, memory used and accuracy go.
Graphical User Interface
– Multi-platform;
– Real-time view of the download job progress;
– Pause/Resume/Stop download job anytime;
– Save and Load download job template files;
– Regular Expression Editor;
– Check for Updates support;
– Online Help and support.
General Download Features
– Multithreaded – configurable number of parallel download jobs to run at a certain period of
time;
– Memory control options – user can control what happens to download jobs after they finish;
– Multiple starting points (URLs) for download job – user can specifymultiple hosts on which
a download job can run.
HTTP Connection Features
– HTTP/HTTPS support;
– GZip compression support;
– HTTP Proxy support;
– WWW Authentication support;
– Cookies support;
– Request customization support: referral behavior, configurable agent name;
– HTTP response code analysis and configurable behavior;
– Connection limits support: number of maximum connections per server, retries number
control, bandwidth limitation, limitation depending on the HTTP response code.
Download Control Features
– Maximum search depth support;
– Maximum number of followed links support;
– Maximum time limit support;
– Downloaded file size support;
– Followed URL prefix support;
– Host Name limitation support;
– Save to Disk limitation support;
– Response behavior limitation matching response header with regular expressions;
– Response behavior limitation matching response content with regular expressions;
– Downloaded file content limitation support.

Share Button

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>