Skip to main content

HTML Parsing and Selenium

HTML parsing is always been a burning requirement with selenium. Though Selenium doesn’t have built in API which could do HTML parsing,
given its high integrability it could be integrated with HTMP parser to achieve the same. I have experimented on HTML parsing using Jericho which is java library. To begin HTML parsing the only demand Jericho makes is about HTML Source and this could be obtained using Selenium API - getHtmlSource(). Herein I have listed functions which I have developed using Jericho -

Count number of tables on a page –

// Get Source object for HTML Tables.
Source source = new Source(selenium.getHtmlSource());
List table = source.getAllElements(HTMLElementName.TABLE);

Reporter.log("Number of Tables are: " +table.size());

***Reporter is TestNG API***
Retrieve Table Data-

// Retrieve table data from a specific table.Source tableSource = new Source(table.get(3).toString());

Reporter.log("Table data is:" +HTMLTableParser.getTableData(tableSource, false)); Reporter.log("True Table data is:" +HTMLTableParser.getTableData(tableSource, true));

Definition of ***getTableData*** is as following –


* Returns the Segment or content of HTML table

* available between Start and End tag


* @param tableSource

* @param rawHTMLData


* @return HTML Table data


public static List getTableData(Source tableSource, Boolean rawHTMLData) {

// Table data to be returned

List tableData = new ArrayList ();

// Collect table rows

List tableRows = tableSource.getAllElements(HTMLElementName.TR);

// Loop through table rows

for (int tableRowIndex=0; tableRowIndex data = tableRow.getAllElements(HTMLElementName.TD);

// Loop through table columns

for(int tableColummnIndex=0; tableColummnIndex tableRows = tableSource.getAllElements(HTMLElementName.TR);

return tableRows.size();


Count Number of columns in a individual rows –
Map rowAndCoumnCount = HTMLTableParser.countTableColumnsInRows(tableSource);

for(Map.Entry rowAndColumnData : rowAndCoumnCount.entrySet()){

Reporter.log("Number of columns at row: " +rowAndColumnData.getKey()

+" are: " +rowAndColumnData.getValue());


// Get data from individual columns.

Reporter.log("Column specific table data is:" +HTMLTableParser.getTableDataForColumn(tableSource, false, 0, 1));

Reporter.log("Column specific raw table data is:" +HTMLTableParser.getTableDataForColumn(tableSource, true, 0, 1));

Definition of ***countTableColumnsInRows*** is as following –


* Retrieves table data for specific columns beginning from specific row

* To return data from beginning of row pass rowNumber as *0


* @param tableSource

* @param rawHTMLData

* @param rowNumber

* @param columnNumber

* @return Table Data


public static List getTableDataForColumn(Source tableSource,

Boolean rawHTMLData, int rowNumber, int columnNumber) {

// Table data to be returned

List tableData = new ArrayList ();

// Collect table rows

List tableRows = tableSource.getAllElements(HTMLElementName.TR);

// Loop through table rows

for (int tableRowIndex=rowNumber; tableRowIndex data = tableRow.getAllElements(HTMLElementName.TD);

// If supplied index is with in size of table data

// This check is useful when retrieving data from uneven html table

if (columnNumber < rawhtmldata ="="">


  1. hi tarun

    i have got stuck in selnium RC ,am very new to it please help me
    i want to capture all the links in the new web page.i have almost tried with all the selenium .get commands but its not working please help me;if a code sample is given as an example its very much appreciated


Post a Comment

No spam only genuine comments :)

Popular posts from this blog

Appium and android mobile app automation

Next appium and Android mobile app automation video tutoria l is live. If you are new to appium then please check - appium-tutorial This video tutorial covers - Start vysor (Just for this session and not mobile automation :)) Start appium and start appium inspector Desired Capabilities platformName - Android deviceName - L2N0219828001013 (as seen on "adb devices") Saved Capability Sets Start Session Scan app elements using appium inspector Get appPackage and appActivity using "APK info" app Install "APK info" app and open app whose appPackage and appActivity are required i.e. calculator Check top section of app icon is app package is app activity testng.xml file settings for running Android app tests Test details and CalculatorScreen class View beautiful STF test report  

Verify email confirmation using Selenium

Note: If you are new to java and selenium then start with selenium java training videos .     Email confirmation seems to be integral part of any registration process. I came across an application which lets you provide your email address. You can follow the sign up link in you mail and then complete the registration process. Lets consider we provide GMail address for it. Now if were to use only Selenium then we would have to follow following steps - Launch GMail using Selenium; Some how search for new mail in the list of available mails; Some how click on it; Parse the mail message; Get the registration link; Follow up with registration process What do you think of an approach in which you can

Selenium Tutorial: Ant Build for Selenium Java project

Ant is a build tool which could be used to have your tests running either from command line or from Hudson CI tool. There is detailed documentation available for ant here but probably you need to know only a little part of it for you selenium tests. The essentials which are needed to know are: Project Target (ant execution point and collection of tasks) Tasks (could be as simple as compilation) And there would usually be following targets for Selenium tools - setClassPath - so that ant knows where you jar files are loadTestNG - so that you could use testng task in ant and use it to execute testng tests from ant init - created the build file clean - delete the build file compile - compiles the selenium tests run - executes the selenium tests Here is my project set up for ant -