Extract Title from Web Pages in Java

In this example, We will show you simple program about, How to extract title from web pages in Java. This example was build using Java Jsoup API. Which is used to process the HTML documents from URL or any source. The example program has been tested with environment and output shared in the same post.

Example Program (WebUtils.java)

package com.dineshkrish;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
/**
* 
* @author Dinesh Krishnan
*
*/
public class WebUtils {
// method to extract title from url
public String getTitle(final String link) {
String title = null;
try {
// creating URL object
URL url = new URL(link);
// getting the HTML documents from the url
Document document = Jsoup.parse(url, 5000);
// extracting the title from given url
title = document.title();
} catch (MalformedURLException e) {
System.out.println(e.getMessage());
e.printStackTrace();
} catch (IOException e) {
System.out.println(e.getMessage());
e.printStackTrace();
}
return title;
}
public static void main(String[] args) {
// input url you can change accordingly
String link = "http://www.google.com";
WebUtils utils = new WebUtils();
// printing the extracted title
System.out.println(utils.getTitle(link));
}
}

Maven Dependency (pom.xml)

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.dineshkrish</groupId>
<artifactId>JsoupExample</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.9.2</version>
</dependency>
</dependencies>
</project>

Output

Google

References

1. Jsoup Documentation
2. JavaDoc – Java JSoup API
3. JavaDoc – Jsoup Class

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *