Skip to main content

HTML Parsing and Selenium

HTML parsing is always been a burning requirement with selenium. Though Selenium doesn’t have built in API which could do HTML parsing,
given its high integrability it could be integrated with HTMP parser to achieve the same. I have experimented on HTML parsing using Jericho which is java library. To begin HTML parsing the only demand Jericho makes is about HTML Source and this could be obtained using Selenium API - getHtmlSource(). Herein I have listed functions which I have developed using Jericho -

Count number of tables on a page –

// Get Source object for HTML Tables.
Source source = new Source(selenium.getHtmlSource());
List table = source.getAllElements(HTMLElementName.TABLE);

Reporter.log("Number of Tables are: " +table.size());

***Reporter is TestNG API***
Retrieve Table Data-

// Retrieve table data from a specific table.Source tableSource = new Source(table.get(3).toString());

Reporter.log("Table data is:" +HTMLTableParser.getTableData(tableSource, false)); Reporter.log("True Table data is:" +HTMLTableParser.getTableData(tableSource, true));


Definition of ***getTableData*** is as following –


/**

* Returns the Segment or content of HTML table

* available between Start and End tag

*

* @param tableSource

* @param rawHTMLData

*

* @return HTML Table data

*/

public static List getTableData(Source tableSource, Boolean rawHTMLData) {

// Table data to be returned

List tableData = new ArrayList ();

// Collect table rows

List tableRows = tableSource.getAllElements(HTMLElementName.TR);

// Loop through table rows

for (int tableRowIndex=0; tableRowIndex data = tableRow.getAllElements(HTMLElementName.TD);

// Loop through table columns

for(int tableColummnIndex=0; tableColummnIndex tableRows = tableSource.getAllElements(HTMLElementName.TR);

return tableRows.size();

}


Count Number of columns in a individual rows –
Map rowAndCoumnCount = HTMLTableParser.countTableColumnsInRows(tableSource);

for(Map.Entry rowAndColumnData : rowAndCoumnCount.entrySet()){

Reporter.log("Number of columns at row: " +rowAndColumnData.getKey()

+" are: " +rowAndColumnData.getValue());

}

// Get data from individual columns.

Reporter.log("Column specific table data is:" +HTMLTableParser.getTableDataForColumn(tableSource, false, 0, 1));

Reporter.log("Column specific raw table data is:" +HTMLTableParser.getTableDataForColumn(tableSource, true, 0, 1));

Definition of ***countTableColumnsInRows*** is as following –
/**

*

* Retrieves table data for specific columns beginning from specific row

* To return data from beginning of row pass rowNumber as *0

*

* @param tableSource

* @param rawHTMLData

* @param rowNumber

* @param columnNumber

* @return Table Data

*/

public static List getTableDataForColumn(Source tableSource,

Boolean rawHTMLData, int rowNumber, int columnNumber) {

// Table data to be returned

List tableData = new ArrayList ();

// Collect table rows

List tableRows = tableSource.getAllElements(HTMLElementName.TR);

// Loop through table rows

for (int tableRowIndex=rowNumber; tableRowIndex data = tableRow.getAllElements(HTMLElementName.TD);

// If supplied index is with in size of table data

// This check is useful when retrieving data from uneven html table

if (columnNumber < rawhtmldata ="="">

Comments

  1. hi tarun

    i have got stuck in selnium RC ,am very new to it please help me
    i want to capture all the links in the new web page.i have almost tried with all the selenium .get commands but its not working please help me;if a code sample is given as an example its very much appreciated

    ReplyDelete

Post a Comment

No spam only genuine comments :)

Popular posts from this blog

Using chrome console to test xPath and css selectors

Note: If you are new to java and selenium then start with selenium java training videos .       Since the advent of selenium there have been many plugin to test xPath / css selectors but you don’t need any of them if you have chrome browser. Using Chrome console you can test both xPath and css selectors. Launch website to be tested in chrome browser and hit F-12 and you would see chrome console opened in lower pane of application - Hit escape key and console would open another pane to write element locators - And now you can start writing xPath or css selectors in chrome console and test them - The syntax for writing css id - $$(“ ”) And hit the enter key. If your expression is right then html snippet of the application element corresponding to the css selector would be displayed - If you mouse over the html snippet in chrome console then it would highlight the corresponding element in application - If you want to clean console of previously wri

XPath and single quotes

I had tough time dealing with XPath and single quote. Though W3C recommends using ' to escape it but I never got it working, let me know if any of you get through. Came across this blog and found that “concat” could be used in this situation. So original XPath expression is – //meta[@name=’DESCRIPTION’][@content=’Tester’s Test’] This is some thing which certainly fails as single quote in “Tester’s” marks it at end of string and then XPath blows up, next trial was – //meta[@name=’DESCRIPTION’][@content=’Tester''s Test’] This does not work despite w3c recommendation! And then I used concat function and split the string as – concat(‘Tester’,”’”,’s Test’) NOTICE that single quote is kept in double quote while other characters are kept in single quotes. So XPath looks as – //meta[@name=’DESCRIPTION’][@content=concat(‘Tester’,”’”,’s Test’)] And this works charm.

Return only first or last element from webelements collection

Note: If you are new to java and selenium then start with selenium java training videos .     We often come across situation when there are multiple elements on a page and we probably like to exercise only a few of them using selenium webdriver. May be just first and last element. For example on a search result page we may like to click on only first and last link and not all. This is when Iterables API comes handy. (By the way I am assuming that you have already completed watching selenium training videos :)). Once we have collection of web element then we can use Iterables to get only first or last element as following - Consider that we fetch collection of element as - List< WebElement > webElements = getDriver().findElements(By. id ( "htmlID" ));   Now we can get the first web element from this collection as -  WebElement firstElement = Iterables. getFirst (webElements,  getDriver().findElement(By. id ( "defaultElement" )));   Herein second