C# Regular Expressions Tutorial
View more Tutorials:
A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined by the regular expression may match one or several times or not at all for a given string.
The abbreviation for regular expression is regex.
The abbreviation for regular expression is regex.
Regular expressions are supported by most programming languages, e.g., C#, Java, Perl, Groovy, etc. Unfortunately each language supports regular expressions slightly different.
You may be interested:
No | Regular Expression | Description |
1 | . |
Matches one or more characters. |
2 | ^regex |
Finds regex that must match at the beginning of the line. |
3 | regex$ |
Finds regex that must match at the end of the line. |
4 | [abc] |
Set definition, can match the letter a or b or c. |
5 | [abc][vz] |
Set definition, can match a or b or c followed by either v or z. |
6 | [^abc] |
When a caret appears as the first character inside square brackets, it negates the pattern. This can match any character except a or b or c. |
7 | [a-d1-7] |
Ranges: matches a letter between a and d and figures from 1 to 7. |
8 | X|Z |
Finds X or Z. |
9 | XZ |
Finds X directly followed by Z. |
10 | $ |
Checks if a line end follows. |
11 | \d |
Any digit, short for [0-9] |
12 | \D |
A non-digit, short for [^0-9] |
13 | \s |
A whitespace character, short for [ \t\n\x0b\r\f] |
14 | \S |
A non-whitespace character, short for [^\s] |
15 | \w |
A word character, short for [a-zA-Z_0-9] |
16 | \W |
A non-word character [^\w] |
17 | \S+ |
Several non-whitespace characters |
18 | \b |
Matches a word boundary where a word character is [a-zA-Z0-9_] . |
19 | * |
Occurs zero or more times, is short for {0,} |
20 | + |
Occurs one or more times, is short for {1,} |
21 | ? |
Occurs no or one times, ? is short for {0,1} . |
22 | {X} |
Occurs X number of times, {} describes the order of the preceding liberal |
23 | {X,Y} |
Occurs between X and Y times, |
24 | *? |
? after a quantifier makes it a reluctant quantifier. It tries to find the smallest match. |
Special characters in C# Regex:
\.[{(*+?^$|
The characters listed above are special characters. In C# Regex you want it understood that character in the normal way you should add a \ in front.
Example dot character . C# Regex is interpreted as one or more characters, if you want it interpreted as a dot character normally required mark \ ahead.
Example dot character . C# Regex is interpreted as one or more characters, if you want it interpreted as a dot character normally required mark \ ahead.
// Regex pattern describe one or more characters. string regex = "."; // Regex pattern describe a dot character. string regex = "\\."; string regex = @"\.";
- Regex class
... // Check the entire String object matches the regex or not. public bool IsMatch(string regex) ..
Using the method Regex.IsMatch(string regex) allows you to check the entire string matches the regex or not. This is the most common way. Consider these examples:
Regex .
In the regular expression of C#, the dot character (.) Is a special character. It represents one or more characters. If you want C# to understand it is a dot in the usual sense you need to write "\\." Or @ "\."
DotExample.cs
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Text.RegularExpressions; namespace RegularExpressionTutorial { class DotExample { public static void Main(string[] args) { // String with 0 character (Empty string). string s1 = ""; Console.WriteLine("s1=" + s1); // Check s1 // Match one or more characters. // Rule . // ==> False bool match = Regex.IsMatch(s1, "."); Console.WriteLine(" -Match . " + match); // String with 1 character. string s2 = "a"; Console.WriteLine("s2=" + s2); // Check s2 // Match one or more characters // Rule . // ==> True match = Regex.IsMatch(s2, "."); Console.WriteLine(" -Match . " + match); // String with 3 characters. string s3 = "abc"; Console.WriteLine("s3=" + s3); // Check s3 // Match one or more characters. // Rule . // ==> true match = Regex.IsMatch(s3, "."); Console.WriteLine(" -Match . " + match); // String with 3 characters. string s4 = "abc"; Console.WriteLine("s4=" + s4); // Check s4 // Match with dot charactor. // ==> False match = Regex.IsMatch(s4, @"\."); Console.WriteLine(" -Match \\. " + match); // String with 1 character (Dot character). string s5 = "."; Console.WriteLine("s5=" + s5); // Check s5 // Match with dot charactor // ==> True match = Regex.IsMatch(s5, @"\."); Console.WriteLine(" -Match \\. " + match); Console.Read(); } } }
Run the example:

Another example uses Regex.IsMath (string):
RegexIsMatchExample.cs
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Text.RegularExpressions; namespace RegularExpressionTutorial { class RegexIsMatchExample { public static void Main(string[] args) { // String with one character string s2 = "m"; Console.WriteLine("s2=" + s2); // Check s2 // Start by 'm' // Rule ^ // ==> true bool match = Regex.IsMatch(s2, "^m"); Console.WriteLine(" -Match ^m " + match); // A string with 7 characters string s3 = "MMnnnnn"; Console.WriteLine("s3=" + s3); // Check the entire s3 // Start by MM // Rule ^ // ==> true match = Regex.IsMatch(s3, "^MM"); Console.WriteLine(" -Match ^MM " + match); // Check s3 // Start by MM // Next character 'n', appearing one or more times. // Rule ^ and + // ==> true match = Regex.IsMatch(s3, "^MMn+"); Console.WriteLine(" -Match ^MMn+ " + match); // String with one character String s4 = "p"; Console.WriteLine("s4=" + s4); // Check s4 ending with 'p' // Rule $ // ==> true match = Regex.IsMatch(s4, "p$"); Console.WriteLine(" -Match p$ " + match); // A string with 6 characters. string s5 = "122nnp"; Console.WriteLine("s5=" + s5); // Check the entire s5 end withs 'p' // ==> true match = Regex.IsMatch(s5, "p$"); Console.WriteLine(" -Match p$ " + match); // Check the entire s5 // Start with one or more characters (Rule . ) // Followed by 'n', appear one or up to three times (Rule n{1,3} ) // End withs 'p' (Rule: p$) // Combine the rules: . , {x, y}, $ // ==> true match = Regex.IsMatch(s5, ".n{1,3}p$"); Console.WriteLine(" -Match .n{1,3}p$ " + match); String s6 = "2ybcd"; Console.WriteLine("s6=" + s6); // Check s6 // Start by '2' // Next 'x' or 'y' or 'z' (Rule [xyz]) // Followed by any, appear 0 or more times. // Combine the rules: [xyz] , * // ==> true match = Regex.IsMatch(s6, "2[xyz].*"); Console.WriteLine(" -Match 2[xyz].* " + match); string s7 = "2bkbv"; Console.WriteLine("s7=" + s7); // Check s7 Start any, one or more times // Followed by 'a' or 'b', or 'c': [abc] // Next 'z' or 'v': [zv] // Followed by any (0 or more times) // ==> true match = Regex.IsMatch(s7, ".[abc][zv].*"); Console.WriteLine(" -Match .[abc][zv].* " + match); Console.Read(); } } }
Results of running the example:

Next example:
RegexIsMatchExample2.cs
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Text.RegularExpressions; namespace RegularExpressionTutorial { class RegexIsMatchExample2 { public static void Main(string[] args) { String s = "The film Tom and Jerry!"; // Check entire s // Begin by any characters appear 0 or more times (Rule: .*) // Next "Tom" or "Jerry" // End with any characters appear 1 or more times (Rule: .) // Combine the rules: ., *, X|Z bool match = Regex.IsMatch(s, ".*(Tom|Jerry)."); Console.WriteLine("s=" + s); Console.WriteLine("-Match .*(Tom|Jerry). " + match); s = "The cat"; // ==> false match = Regex.IsMatch(s, ".*(Tom|Jerry)."); Console.WriteLine("s=" + s); Console.WriteLine("-Match .*(Tom|Jerry). " + match); s = "The Tom cat"; // ==> true match = Regex.IsMatch(s, ".*(Tom|Jerry)."); Console.WriteLine("s=" + s); Console.WriteLine("-Match .*(Tom|Jerry). " + match); Console.Read(); } } }
Results of running the example:

One of the other useful methods is Regex.Split (string, string), which separates a string into substrings. For example, you have the string "One, Two, Three" and you want to split it into 3 substrings, separated by commas.
SplitWithRegexExample.cs
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Text.RegularExpressions; namespace RegularExpressionTutorial { class SplitWithRegexExample { public static void Main(string[] args) { // \t: TAB character // \n: NewLine character string TEXT = "This \t\t is \n my \t text"; Console.WriteLine("TEXT=" + TEXT); // Define a regex: // Whitespace appear 1 or more times. // Whitespace characters: \t\n\x0b\r\f // Rulers: \s and + String regex = @"\s+"; Console.WriteLine(" -------------- "); String[] splitString = Regex.Split(TEXT, regex); Console.WriteLine(splitString.Length); // ==> 4 foreach (string str in splitString) { Console.WriteLine(str); } Console.WriteLine(" -------------- "); // Replace whitespaces with TAB String newText = Regex.Replace(TEXT, "\\s+", "\t"); Console.WriteLine("New text=" + newText); Console.Read(); } } }
Run the example:

Use Regex.Matches(...) method to search all the substrings of a string, matching a regular expression, this method returns a MatchCollection object.
** Regex.Matches() **
public MatchCollection Matches( string input ) public MatchCollection Matches( string input, int startat ) public static MatchCollection Matches( string input, string pattern ) public static MatchCollection Matches( string input, string pattern, RegexOptions options, TimeSpan matchTimeout ) public static MatchCollection Matches( string input, string pattern, RegexOptions options )
The following example, splits a string into substrings, separated by whitespace.

MatchCollectionExample.cs
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Text.RegularExpressions; namespace RegularExpressionTutorial { class MatchCollectionExample { public static void Main(string[] args) { string TEXT = "This \t is a \t\t\t String"; // \w : A word character, short for [a-zA-Z_0-9] // \w+ : Word character, appear one or more times. string regex = @"\w+"; MatchCollection matchColl = Regex.Matches(TEXT, regex); foreach (Match match in matchColl) { Console.WriteLine(" ---------------- "); Console.WriteLine("Value: " + match.Value); Console.WriteLine("Index: " + match.Index); Console.WriteLine("Length: " + match.Length); } Console.Read(); } } }
Result of running example:

A regular expression you can split into groups:
// A regular expression string regex = @"\s+=\d+"; // Writing as three group, by marking ( ) string regex2 = @"(\s+)(=)(\d+)"; // Two group string regex3 = @"(\s+)(=\d+)";
The group can be nested, and so need a rule indexing the group. The entire pattern is defined as the group 0. The remaining group described similar illustration below:

Note: Use (?:Pattern) to inform C# does not see this as a group (None-capturing group)
You can define a named capturing group (?<groupName>pattern) or (?'groupName'pattern), and you can access the content matched with match.Groups["groupName"]. The regex is longer, but the code is more meaningful, since it indicates what you are trying to match or extract with the regex.
Named capturing group can also be access via match.Groups[groupIndex] with the same numbering scheme.
Named capturing group can also be access via match.Groups[groupIndex] with the same numbering scheme.
-
Let's look at an example using the named group:
NamedGroupExample.cs
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Text.RegularExpressions; namespace RegularExpressionTutorial { class NamedGroupExample { public static void Main(string[] args) { string TEXT = " int a = 100; float b= 130; float c= 110 ; "; // Use (?<groupName>pattern) to define a group named: groupName // Defined group named 'declare': using (?<declare>...) // And a group named 'value': use: (?<value>..) string regex = @"(?<declare>\s*(int|float)\s+[a-z]\s*)=(?<value>\s*\d+\s*);"; MatchCollection matchCollection = Regex.Matches(TEXT, regex); foreach (Match match in matchCollection) { string group = match.Groups["declare"].Value; Console.WriteLine("Full Text: " + match.Value); Console.WriteLine("<declare>: " + match.Groups["declare"].Value); Console.WriteLine("<value>: " + match.Groups["value"].Value); Console.WriteLine("------------------------------"); } Console.Read(); } } }
Results of running the example:

Just to clarify you can see the illustration below:

In some situations *? very important, take a look at the following example:
// This is a regex // any characters, appear 0 or more times, // followed by ' and > string regex = ".*'>"; // TEXT1 match the regex above. string TEXT1 = "FILE1'>"; // And TEXT2 match the regex above. string TEXT2 = "FILE1'> <a href='http://HOST/file/FILE2'>";

*? will find the smallest match. We consider the following example:
NamedGroupExample2.cs
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Text.RegularExpressions; namespace RegularExpressionTutorial { class NamedGroupExample2 { public static void Main(string[] args) { string TEXT = "<a href='http://HOST/file/FILE1'>File 1</a>" + "<a href='http://HOST/file/FILE2'>File 2</a>"; // Define group named fileName. // * means appear 0 or more times. // *? means a smallest match. string regex = "/file/(?<fileName>.*?)'>"; MatchCollection matchCollection = Regex.Matches(TEXT, regex); foreach (Match match in matchCollection) { Console.WriteLine("File Name = " + match.Groups["fileName"].Value); Console.WriteLine("------------------------------"); } Console.Read(); } } }
Results of running the example:
