TagSoup.
TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational a....
Here is the list of declaration for tagsoup. If you use Maven you can use the following code to add the dependency for this POM file.
<dependency> <groupId>org.ccil.cowan.tagsoup</groupId> <artifactId>tagsoup</artifactId> <version>1.2</version> </dependency>
If you think this Maven repository POM file listing for tagsoup is inappropriate, such as containing malicious code/tools or violating the copyright, please email info at java2s dot com, thanks.
Name:Apache License 2.0
URL: http://www.apache.org/licenses/LICENSE-2.0.txt.
The following packages are defined in the tagsoup-1.2.jar
org.ccil.cowan.tagsoup org.ccil.cowan.tagsoup.jaxp
Here is the content of the POM file.
<project> <modelVersion>4.0.0</modelVersion> <groupId>org.ccil.cowan.tagsoup</groupId> <artifactId>tagsoup</artifactId> <name>TagSoup</name> <version>1.2</version> <packaging>jar</packaging> <description>TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.</description> <url>http://home.ccil.org/~cowan/XML/tagsoup/</url> <licenses> <license> <name>Apache License 2.0</name> <url>http://www.apache.org/licenses/LICENSE-2.0.txt</url> <distribution>repo</distribution> </license> </licenses> </project>