Find Broken Links in a Web Page
To find all the broken links in a web page using Selenium in Java, find the web elements with the tag name "a"
using driver.findElements(By.tagName("a"))
. For each link element, send HTTP request. If the link is broken, then the HTTP response code would be one of the following.
Response Code | Description |
---|---|
400 | Bad Request (Bad Host / Bad URL / Empty / Timeout / Reset) |
404 | Page Not Found |
403 | Forbidden |
410 | Gone |
408 | Request Time Out |
503 | Service Unavailable |
Please note that in this tutorial, we define that a link is broken if the request for the link responds with any of the above codes. Your definition of a broken link may change based on your application requirement. Do make necessary changes based on that.
Example
In the following program, we write Selenium Java script to visit google.com, extract all the links in this web page, and iterate over each of the link if the link is broken or not.
Java Program
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
public class MyAppTest {
public static void main(String[] args) {
System.setProperty("webdriver.chrome.driver", "/usr/local/bin/chromedriver");
WebDriver driver = new ChromeDriver();
driver.get("https://google.com/ncr");
List<WebElement> links = driver.findElements(By.tagName("a"));
String url = "";
HttpURLConnection connection = null;
int respCode = 0;
for(WebElement link: links) {
try {
url = link.getAttribute("href");
connection = (HttpURLConnection)(new URL(url).openConnection());
connection.setRequestMethod("HEAD");
connection.connect();
respCode = connection.getResponseCode();
if(respCode == 400 ||
respCode == 403 ||
respCode == 404 ||
respCode == 408 ||
respCode == 410 ||
respCode == 503){
System.out.println("[Broken] - " + url);
}
else{
System.out.println("[Not Broken] - " + url);
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
connection.disconnect();
driver.quit();
}
}
Screenshots
1. Initialize web driver and visit google.com.
WebDriver driver = new ChromeDriver();
driver.get("https://google.com/ncr");
2. Verify if each link is broken or not.
Console Output.