The Magic of Regular Expressions

What are Regular Expressions?

Regular Expressions are basically a pattern that a text is matched with. For example, say I specify a pattern – three letter word followed by two space and then two numbers. So, text matching that would be  ‘xzc  25’ or ‘uzu 88’ etc. Some uses for Regular Expressions (Regex) might be validating email-addresses, phone numbers or urls. You can even used Regex to capture different information (parse) from text.

How they are used

In order to demonstrate their practical use, I’ll use an actual problem that I used Regex to solve as an example. I was making an FtpClient and I had to get file information from what was basically a long continuous string containing many different sets of information. It looked like this:

So each line corresponds to information on a specific file or folder. I added the multiline flag to Regex which tells Regex to treat each line as a different entity and only search within that entity. This is the pattern that I used to extract the data from the above lines :-

Understanding Pattern Matching

On first glance, it might seem intimidating. It really isn’t (unless you try to parse HTML). So the way Regex works is that you can use certain symbols to indicate the content you’re expecting and certain modifiers to indicate the nature of that content.

  • \d – Digits – 0-9
  • \w – Words and Digits – A-Z, a-z, 0-9, _
  • . – Anything
  • \s – Whitespaces

Then you can add modifiers that can make your request more specific.

  • * – 0 or more
  • + – 1 or more
  • ? – 0 or 1
  • {x} – Specifically x times
  • {x,y} – Anywhere between x and y times
  • {x,} – Minimum x times
  • x|y – X or Y

After that you can group sections up into capture groups by surrounding them using brackets (x y) so you can refer to them later if your Regex matches (and even in your current pattern but that’s a harder topic to cover now). Note: C# allows you to group by names using (?<GroupSomething>xyz) so you can reference them later if the pattern matches a text.

Now looking back to the Regex I wrote, I’ll break it down piece by piece. Compare this to one of the lines: –
-rw-r--r-- 1 redacted redacted 220 Apr 18 2010 .bash_logout

  1. (?<dir>d|-) – I expect a ‘d’ or ‘-‘ character. [-]
  2. (?<owner>.{3})(?<group>.{3})(?<public>.{3}) – I expect 3 things followed by another 3 things followed by another 3 things. [rw-r–r–]
  3. \s+(?<files>\d) – Then some space followed by a number. [  1]
  4. \s+(?<user>\w+) – Then some more spaces followed by some text [   redacted]
  5. \s+(?<idk>\w+) – Even more spaces followed by some more text [   redacted]
  6. \s+(?<size>\d+) – Again, spaces followed by a number [  220]

Another way of using Regex is to validate input data and writing validation Regex can be simple but generally people try to challenge themselves into writing the shortest, most efficient Regex possible. Here’s one of my favourites:
/\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b/ig
This is used for validating IP Addresses.

Resources

If you want to explore the world of Regular Expressions and learn how to apply it to your projects, visit RegExr. If you want to test your skills or knowledge while having fun on the side, then I would recommend Regex Crossword!

 

Other Stuff

Social

AryanMann Written by: