Sunday 20 May 2012

Key Word Web Parser Tool (Selenium Vs WinHttp)

With this post I wanted to highlight two interesting techniques for web text parsing via VBA using Selenium framework and WinHttp.

Task:

Given the set of key words (comma separated list) we want to count the occurrence of the key word in the html body of the web page (giving us the popularity of the key word within the set of web pages).

Solution One:

Selenium” as popularly known in the market, is a very robust web testing framework and used in writing automated testing for Web UI. So recently I stumbled upon a very interesting project which is a wrapper written for using selenium via VBA exposing very strong integration of selenium functionalities with VBA. The project home page is very good in describing upon the features of the same.

 

Solution Two:

Using the native WinHttp solution in VBA, we are downloading the HTML text of the web page via “GET” command and thus counting the key word occurrence in the same.

 

Comparison:

For the task in hand I found that using the WinHttp solution wins over the Selenium based methodology purely by the performance and over head the Selenium incurs for the same.

The Selenium framework has its own very important place to automate the Web UI task in hand when JavaScript is also in play with modern day web pages. Selenium plays a very important role where one can script all the user actions that have to be performed on the web, like filling forms, clicking buttons, checking options and more with the power of VBA.

Note: There is a large difference in count values between the Selenium and WinHttp based tools, for the reason that for former I am using Instr function to get the key word count, where as for the later I a using custom written function for the key word count.

Download Solution
Download solution


References:
Link1: http://code.google.com/p/selenium-vba/

Friday 4 May 2012

Excel VBA Data Warehouse Generator Tool

Recently I was observing our database development team, and I observed that much of my colleagues were struggling with lots of boiler plate code. In the process of generating Dimension and Fact tables (in snow flaked schema), I thought it would be nice to create an easy to use tool to generate database schema with the following rules in place:

1. Each dimension table would have a primary key, combined with identity auto generation.
2. A dimension table may have referential constraint upon another dimension table
3. A fact table will have multiple referential constraints from multiple dimension tables but not from any other factual table.
4. A fact table will not have a primary key nor an identity column.

This tool is primarily an assisting tool for repetitive process occurred in the data warehouse table structure generation, though for any other design requirements developer intervention is very much required to meet the special conditions.



To use the tool, there is a main page where all the relevant information needs to be set like connection string, script folder path and updated database on which the operation needs to be performed once the list is refreshed.

Control Page:


Dimension Page:


Fact Page:


Then there is a SQL Server Management Studio style dimension and fact table generation worksheets where user can set the desired columns required according to the business needs along with integrated drop down list on the Table Link column where the refrence dimension tables can be selected(The table name for dimension and fact wil be appended by keywords "Dim" and "Fact" respectivly. And also for the dimension tables the key columns as they will be auto generated via VBA and appended terms like "Id").

Download Solution
Download solution

File - Directory (Tree Map) in Excel

Being a software developer and in my quest of achieving an organised methodology for saving my personal files, I many times end up saving same file in multiple locations. Ahh…

So many times we can’t keep track of how our file structure if expanding underlying the cover of multiple nested folders. So, I thought to develop a tool in Excel/VBA which could visually represent me the file directory structure in the form of tree and also could provide me an easy to navigate mechanism for the nodes I wish to explore.

Some thing like this …



This tool request for the root node location to start its parsing and with the condition provided by the user to layout the sub folder (recursively) by a check box you will achieve the above result.

Public Sub FileLister(rootPath As String, exploreSubFolder As Boolean)

    Dim file As Scripting.file, folder As Scripting.folder, subfolder As Scripting.folder

    If fso Is Nothing Then
        Set fso = New FileSystemObject
    End If

    Set folder = fso.GetFolder(rootPath)

    If (folder.SubFolders.Count > 0) And exploreSubFolder Then
    
        For Each subfolder In folder.SubFolders
            
            nestLevel = nestLevel + 1
            ReDim Preserve navigated(nestLevel)
            
            If navigated(nestLevel) = 0 Then
                For Each file In folder.Files
                    'Record files in folder
                    anchorRange.Offset(printRow, nestLevel + 1).Value = file.Name
                    anchorRange.Offset(printRow, 0).Hyperlinks.Add Control.Range("A12").Offset(printRow, 0), file.path, , , "Link"
                    printRow = printRow + 1
                Next file
            End If
            
            anchorRange.Offset(printRow, nestLevel + 1).Value = subfolder.Name
            anchorRange.Offset(printRow, nestLevel + 1).Interior.Color = 10092543  'Light Yellow
            anchorRange.Offset(printRow, 0).Hyperlinks.Add Control.Range("A12").Offset(printRow, 0), subfolder.path, , , "Link"
            printRow = printRow + 1

            navigated(nestLevel) = 1
            FileLister subfolder.path, exploreSubFolder
            nestLevel = nestLevel - 1
            
        Next subfolder
        
    Else
    
        nestLevel = nestLevel + 1
        For Each file In folder.Files
            'Record files in folder
            anchorRange.Offset(printRow, nestLevel + 1).Value = file.Name
            anchorRange.Offset(printRow, 0).Hyperlinks.Add Control.Range("A12").Offset(printRow, 0), file.path, , , "Link"
            printRow = printRow + 1
        Next file
        For Each subfolder In folder.SubFolders
            anchorRange.Offset(printRow, nestLevel + 1).Value = subfolder.Name
            anchorRange.Offset(printRow, nestLevel + 1).Interior.Color = 10092543  'Light Yellow
            anchorRange.Offset(printRow, 0).Hyperlinks.Add Control.Range("A12").Offset(printRow, 0), subfolder.path, , , "Link"
            printRow = printRow + 1
        Next subfolder
        nestLevel = nestLevel - 1
        
        If exploreSubFolder Then
            navigated(nestLevel) = 0
        End If
        
    End If
End Sub


Download:

Download Solution
Download solution