Website Crawler
One of my customers wanted something to crawl complete website and collect all the links matching with given regular expression. So I came up with this website crawler application.
As the customer was also from the technical field, I could develop GUI which accepts the starting point – start URL – and the regular expression to check against. Once the application starts crawling the given website, it checks each hyper link and if it is matching with the given expression, it adds it to the bucket. In the end, it exports the result bucket as CVS file.
Some web sites detect the automated program by observing the time duration between two requests sent for different web pages on the site. And if they suspect any IP address sending such requests, they simply deny the contents. This application is designed so as to work in such situations as well. The application waits for random number of seconds (3 to 7 seconds) before sending next request. Thus, it imitates human browsing.