Sunday, January 8, 2012

Finding the broken links in a webpage using Selenium


We heard about some Firefox plug-ins to find broken links in a webpage, like Link Checker, Xenu and etc. We need to install these plug-ins with Firefox browser and find the broken URLs or 404 pages.

We can write the Selenium script for the same functionality. How can we do that?
  • We need to find the number of links available on the page
  • We need to track each and every link
  • Finally we can get the response code for each and every URL or link with the help of HttpURLConnection Class and getResponseCode method.
How to find the number of links on the page?

We can find the number using selenium.getXpathCount("//a").intValue() method.

selenium=new DefaultSelenium("localhost", 4444, "*firefox", "http://www.yahoo.com");
selenium.start();
selenium.open("/");
int linkCount = selenium.getXpathCount("//a").intValue();

How to track each and every link on the page?

We can use the for loop and track the links one by one using this.browserbot.getUserWindow().document.links[] method. This will return the complete properties of the <a> tag of each URL. Then we can use selenium.getEval() method to extract only the HREF part of the <a> tag.

for (int i = 0; i < linkCount; i++) 
   {
     
       currentLink = "this.browserbot.getUserWindow().document.links[" + i + "]";
       temp = selenium.getEval(currentLink + ".href");

             }

How to find out the response code of the URL?

We can use HttpURLConnection Class and getResponseCode method for finding the reponse code of the URL.

public static int getResponseCode(String urlString) throws MalformedURLException, IOException {
   URL u = new URL(urlString); 
   HttpURLConnection huc =  (HttpURLConnection)  u.openConnection(); 
   huc.setRequestMethod("GET"); 
   huc.connect(); 
   return huc.getResponseCode();
}

Find out the complete Selenium Script below:

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import org.testng.annotations.BeforeMethod;
import org.testng.annotations.Test;
import com.thoughtworks.selenium.DefaultSelenium;
import com.thoughtworks.selenium.SeleneseTestBase;

public class BrokenURL extends SeleneseTestBase {
public int invalidLink;
String currentLink;
String temp;
public DefaultSelenium selenium;
@BeforeMethod
public void setUp() throws Exception
{
selenium=new DefaultSelenium("localhost", 4444, "*firefox", "http://www.yahoo.com");
selenium.start();
}
@Test
public void testUntitled() throws Exception {
FileOutputStream fout = new FileOutputStream ("broken_links.txt", true);
invalidLink=0;
selenium.open("/");
int linkCount = selenium.getXpathCount("//a").intValue();
   
new PrintStream(fout).println("URL : " + selenium.getLocation());
new PrintStream(fout).println("--------------------------------------------");
   for (int i = 0; i < linkCount; i++) 
   {
    int statusCode=0;
   
       currentLink = "this.browserbot.getUserWindow().document.links[" + i + "]";
       temp = selenium.getEval(currentLink + ".href");
       statusCode=getResponseCode(temp);
       if (statusCode==404)
       {
        new PrintStream(fout).println(selenium.getEval(currentLink + ".href") + " "+ statusCode);
        invalidLink++; 
       }
   }
   new PrintStream(fout).println("Total broken Links = " + invalidLink);
   new PrintStream(fout).println(" ");
fout.close();
   System.out.println(currentLink);
   System.out.println(temp);
}
public static int getResponseCode(String urlString) throws MalformedURLException, IOException {
   URL u = new URL(urlString); 
   HttpURLConnection huc =  (HttpURLConnection)  u.openConnection(); 
   huc.setRequestMethod("GET"); 
   huc.connect(); 
   return huc.getResponseCode();
}


public void tearDown()
{
selenium.close();
selenium.stop();
}
   
}

The above script will identify all the broken links(if any) in yahoo.com and store the 404 URLs in a notepad file called broken_links.txt. If you want to check the broken links for N number of URLs, you can pass the parameters through Data Provider concept or Excel sheet using JXL package.

10 comments:

  1. Hi veera,

    Can u pls provide the code for broken links in selenium 2

    Thanks,
    Hema

    ReplyDelete
  2. Is it at all possible to extendedd this to check 404s on images, JS and CSS (including any images loaded from within CSS files)?

    ReplyDelete
  3. That's nice, but what shall we do if the website requires logging in? We can log in with Selenium and get all the links, but how to modify getResponseCode method to handle such pages?

    ReplyDelete
  4. How can I use this on a HTTPS application? i get over and over all the workarounds to avoid certificates, but i got the same error message:

    javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No name matching xxxx.xxxxxxxxxx.xxxx found

    ReplyDelete
  5. SWIFT Interview questions on

    http://testwithus.blogspot.in/p/swift.htm

    For selenium solution visit
    http://testwithus.blogspot.in/p/blog-page.html

    ReplyDelete
  6. Hi Veera,

    I tried with the above code but did not work for me. Can Is it mandatory to use "DefaultSelenium"?

    Thanks,
    Sudhansu

    ReplyDelete
  7. It's also possible to add some method that will check only specific links, e.g. with German domain .de, or will skip mail links.
    more ditails here http://www.skillim.com/check-broken-links-on-a-page-using-selenium/

    ReplyDelete
  8. Thanks for Sharing this valuble information and itis useful for me and Selenium learners.We also provides the best Online Selenium Training classes

    ReplyDelete
  9. Thank you provide valuable informations and iam seacrching same informations,and saved my time SAS Online Training

    ReplyDelete