Sunday 20 May 2012

Key Word Web Parser Tool (Selenium Vs WinHttp)

With this post I wanted to highlight two interesting techniques for web text parsing via VBA using Selenium framework and WinHttp.

Task:

Given the set of key words (comma separated list) we want to count the occurrence of the key word in the html body of the web page (giving us the popularity of the key word within the set of web pages).

Solution One:

Selenium” as popularly known in the market, is a very robust web testing framework and used in writing automated testing for Web UI. So recently I stumbled upon a very interesting project which is a wrapper written for using selenium via VBA exposing very strong integration of selenium functionalities with VBA. The project home page is very good in describing upon the features of the same.

 

Solution Two:

Using the native WinHttp solution in VBA, we are downloading the HTML text of the web page via “GET” command and thus counting the key word occurrence in the same.

 

Comparison:

For the task in hand I found that using the WinHttp solution wins over the Selenium based methodology purely by the performance and over head the Selenium incurs for the same.

The Selenium framework has its own very important place to automate the Web UI task in hand when JavaScript is also in play with modern day web pages. Selenium plays a very important role where one can script all the user actions that have to be performed on the web, like filling forms, clicking buttons, checking options and more with the power of VBA.

Note: There is a large difference in count values between the Selenium and WinHttp based tools, for the reason that for former I am using Instr function to get the key word count, where as for the later I a using custom written function for the key word count.

Download Solution
Download solution


References:
Link1: http://code.google.com/p/selenium-vba/

No comments: