Please post your Web Driver questions in official Web Driver forum

Tuesday, May 3, 2011

HTML Parsing using jsoup

Came across jsoup of late, while automating web accessibility tests using Selenium.
Selenium gets me the page html and jsoup does the magic of extracting required information from html to find if web page is accessibility compliant or not.
You would largely be dealing with Document (which in turn extends Element) and Elements classes when using jsoup.

Consider you want to find all 'class' attributes in "div" of a web page then you could use some thing like -


Document document = Jsoup.parse(selenium.getHTMLSource);
        Elements elements = document.getElementsByTag("div");
        for(IteratordivIterator=elements.iterator(); divIterator.hasNext();) {

            System.out.println(divIterator.next().attr("class"));
}


Not only this, if you know the attribute value you could also find out if it appears under correct node. It could be used in automating aria test for attribute role for a web page.



For a detailed list of jsoup capabilities visit jsoup page at - http://jsoup.org/
Fork me on GitHub