For example, from a website I want to download (save to disk) only PDF files.
The principle of downloading only certain files based on their extension is that Darcy must process all (most) of the files and only save to disk certain files.
This can be done by specifying the Regular Expressions that match only the files that are ment to be saved to disk. This can be done from the “Job Package Settings -> Simple Rules -> Save To Disk Filter“.
A regular expression that matches PDF files looks as it follows:
Note that the regular expression is not case sensitive.
For multiple extension you can either enter multiple such rules, either enter a single rule as it follows:
Note that Darcy Ripper contains a utility tool for Regular Expression tests, i.e. “Menu Bar -> Utilities -> Regular Expression Tester“.