Strip tags from HTML to create Text version of a web page : Regular Expressions « Development « VB.Net






Strip tags from HTML to create Text version of a web page

     

Public Class MainClass
    Public Shared Sub Main()
        System.Console.WriteLine(StripTags(GetPageHTML("http://www.g.com/")))
    End Sub

    Public Shared Function StripTags(ByVal HTML As String) As String
        Return System.Text.RegularExpressions.Regex.Replace(HTML, "<[^>]*>", "")
    End Function

    Public Shared Function GetPageHTML(ByVal URL As String) As String
        Dim objWC As New System.Net.WebClient()
        Return New System.Text.UTF8Encoding().GetString(objWC.DownloadData(URL))
    End Function
End Class

   
    
    
    
    
  








Related examples in the same category

1.Use Regular Expression to Validate Email addressUse Regular Expression to Validate Email address
2.Validate TextBox: cannot be empty
3.TextBox validation: validate in KeyPressed EventTextBox validation: validate in KeyPressed Event
4.Use Regular Expressions to parse IP address
5.Use Regex to separate stringsUse Regex to separate strings
6.Use Regx.split to split stringUse Regx.split to split string
7.Use Regex to matchUse Regex to match
8.Regular to parse time: 04:03:27Regular to parse time: 04:03:27
9.Regular Expressions MatchRegular Expressions Match
10.Use Regular Expressions to Split StringUse Regular Expressions to Split String
11.Demonstrating Class Regex
12.Regular Expressions: Validate NameRegular Expressions: Validate Name
13.Regular Expressions: Validate AddressRegular Expressions: Validate Address
14.Regular Expressions: Validate City
15.Regular Expressions: Validate Zip CodeRegular Expressions: Validate Zip Code
16.Regular Expressions: validate Phone Number
17.Using Regex method Replace: ^Using Regex method Replace: ^
18.Using Regex method Replace: by another stringUsing Regex method Replace: by another string
19.Using Regex method Replace:\w+Using Regex method Replace:\w+
20.Using Regex method Replace:First 3 digits replacedUsing Regex method Replace:First 3 digits replaced
21.Using Regex method Replace: String split at commasUsing Regex method Replace: String split at commas
22.\w matches any word character.
23.(\w) matches a word character. This is the first capturing group.
24.\1 match the value of the first capture.
25.\s matches any white-space character.
26.\b: Begin the match at a word boundary.
27.\w+: Match one or more word characters.
28.(e)*: Match an "e" either zero or one time.
29.(\s|$) Match either a whitespace character or the end of the input string.
30.^: Begin the match at the beginning of the input string.
31.\D: Match a non-digit character.
32.\d{1,5} Matches from one to five decimal digits.
33.\D* matches zero or one non-decimal character.
34.$ Matches the end of the input string.
35.\S matches any non-white-space character.
36.\b: Begin the match at a word boundary.
37.(\S+): matches one or more non-white-space characters. This is the first capturing group.
38.\s*: matches zero or one white-space character.
39.Regular expression for class and group
40.Decimal Digit Character: \d
41.Parse Link and Image tags in a HTML file
42.Parse Image tags in a HTML file
43.Regex Class represents an immutable regular expression.
44.Define a regular expression for repeated words
45.Find duplicates
46.ArgumentException Class is thrown when one of the arguments provided to a method is not valid.
47.ArgumentOutOfRangeException is thrown when the value of an argument is outside the allowable range
48.Capture Class represents the results from a single successful subexpression capture.
49.CharUnicodeInfo Class has information about a Unicode character
50.Validate email address
51.Regex.Split