Parse Link and Image tags in a HTML file : Regular Expressions « Development « VB.Net

Parse Link and Image tags in a HTML file


Imports System.Collections

Public Class MainClass
    Public Function ParseLinks(ByVal HTML As String) As ArrayList
        Dim objRegEx As System.Text.RegularExpressions.Regex
        Dim objMatch As System.Text.RegularExpressions.Match
        Dim arrLinks As New System.Collections.ArrayList()
        objRegEx = New System.Text.RegularExpressions.Regex( _
            "a.*href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))", _
            System.Text.RegularExpressions.RegexOptions.IgnoreCase Or _
        objMatch = objRegEx.Match(HTML)
        While objMatch.Success
            Dim strMatch As String
            strMatch = objMatch.Groups(1).ToString
            objMatch = objMatch.NextMatch()
        End While
        Return arrLinks
    End Function

End Class


Related examples in the same category

1.Use Regular Expression to Validate Email addressUse Regular Expression to Validate Email address
2.Validate TextBox: cannot be empty
3.TextBox validation: validate in KeyPressed EventTextBox validation: validate in KeyPressed Event
4.Use Regular Expressions to parse IP address
5.Use Regex to separate stringsUse Regex to separate strings
6.Use Regx.split to split stringUse Regx.split to split string
7.Use Regex to matchUse Regex to match
8.Regular to parse time: 04:03:27Regular to parse time: 04:03:27
9.Regular Expressions MatchRegular Expressions Match
10.Use Regular Expressions to Split StringUse Regular Expressions to Split String
11.Demonstrating Class Regex
12.Regular Expressions: Validate NameRegular Expressions: Validate Name
13.Regular Expressions: Validate AddressRegular Expressions: Validate Address
14.Regular Expressions: Validate City
15.Regular Expressions: Validate Zip CodeRegular Expressions: Validate Zip Code
16.Regular Expressions: validate Phone Number
17.Using Regex method Replace: ^Using Regex method Replace: ^
18.Using Regex method Replace: by another stringUsing Regex method Replace: by another string
19.Using Regex method Replace:\w+Using Regex method Replace:\w+
20.Using Regex method Replace:First 3 digits replacedUsing Regex method Replace:First 3 digits replaced
21.Using Regex method Replace: String split at commasUsing Regex method Replace: String split at commas
22.Strip tags from HTML to create Text version of a web page
23.\w matches any word character.
24.(\w) matches a word character. This is the first capturing group.
25.\1 match the value of the first capture.
26.\s matches any white-space character.
27.\b: Begin the match at a word boundary.
28.\w+: Match one or more word characters.
29.(e)*: Match an "e" either zero or one time.
30.(\s|$) Match either a whitespace character or the end of the input string.
31.^: Begin the match at the beginning of the input string.
32.\D: Match a non-digit character.
33.\d{1,5} Matches from one to five decimal digits.
34.\D* matches zero or one non-decimal character.
35.$ Matches the end of the input string.
36.\S matches any non-white-space character.
37.\b: Begin the match at a word boundary.
38.(\S+): matches one or more non-white-space characters. This is the first capturing group.
39.\s*: matches zero or one white-space character.
40.Regular expression for class and group
41.Decimal Digit Character: \d
42.Parse Image tags in a HTML file
43.Regex Class represents an immutable regular expression.
44.Define a regular expression for repeated words
45.Find duplicates
46.ArgumentException Class is thrown when one of the arguments provided to a method is not valid.
47.ArgumentOutOfRangeException is thrown when the value of an argument is outside the allowable range
48.Capture Class represents the results from a single successful subexpression capture.
49.CharUnicodeInfo Class has information about a Unicode character
50.Validate email address