The DefaultLinkExtractor is a Java library designed to extract all the links from a given webpage. It utilizes JSoup to parse the HTML content.
For those using Gradle, add the following to your build.gradle file:
implementation 'io.github.revfactory:sub-link-extractor:0.1.1'For Maven users, add the following to your pom.xml:
<dependency>
<groupId>io.github.revfactory</groupId>
<artifactId>sub-link-extractor</artifactId>
<version>0.1.1</version>
</dependency>import io.github.revfactory.LinkExtractorStrategy;
import io.github.revfactory.DefaultLinkExtractor;
// ...
LinkExtractorStrategy extractor = new DefaultLinkExtractor(1500); // 1.5 second delay
List<String> links = extractor.extractLinks("http://example.com/docs");Remember, the library can also be customized using different strategies by implementing the LinkExtractorStrategy interface.
The extractLinks method can throw an IOException. Ensure you have proper error handling to manage any network issues or other related problems.
This project is distributed under the Apache 2.0 license.