Skip to main content

HTML Parsing using jsoup

Came across jsoup of late, while automating web accessibility tests using Selenium.
Selenium gets me the page html and jsoup does the magic of extracting required information from html to find if web page is accessibility compliant or not.
You would largely be dealing with Document (which in turn extends Element) and Elements classes when using jsoup.

Consider you want to find all 'class' attributes in "div" of a web page then you could use some thing like -


Document document = Jsoup.parse(selenium.getHTMLSource);
        Elements elements = document.getElementsByTag("div");
        for(IteratordivIterator=elements.iterator(); divIterator.hasNext();) {

            System.out.println(divIterator.next().attr("class"));
}


Not only this, if you know the attribute value you could also find out if it appears under correct node. It could be used in automating aria test for attribute role for a web page.



For a detailed list of jsoup capabilities visit jsoup page at - http://jsoup.org/

Comments

Popular posts from this blog

Verify email confirmation using Selenium

Note: If you are new to java and selenium then start with selenium java training videos .     Email confirmation seems to be integral part of any registration process. I came across an application which lets you provide your email address. You can follow the sign up link in you mail and then complete the registration process. Lets consider we provide GMail address for it. Now if were to use only Selenium then we would have to follow following steps - Launch GMail using Selenium; Some how search for new mail in the list of available mails; Some how click on it; Parse the mail message; Get the registration link; Follow up with registration process What do you think of an approach in which you can

Selenium Tutorial: Get attribute of an element

With Selenium 1.0 Let us consider Google Search Box for example and its "max length"is to be retrieved. Using xPath -         String var = selenium.getAttribute("//input[@name='q']/@maxlength");         System.out.println(var); Using css locator -                        String var = selenium.getAttribute("css=input[name='q']@maxlength");         System.out.println(var);        With Selenium 2.0 (WebDriver)        Using xPath -         String var = webDriver.findElement(By.xpath("//input[@name='q']")).getAttribute("maxlength")         System.out.println(var); Using css locator -                     ...

Capture network traffic using WebDriver

We often come across testing requirements when we need to analyze the network traffic to find - HTTP status of page Analyze header information to find if right information is passed Validating parameters related to ajax requests etc Selenium 1 has had a way to capture n/w traffic but the feature does not always work as expected. At times Selenium 1 does not capture all n/w traffic, And given that Selenium 1 APIs are almost dead it is