REGEX formulas in Excel are now rolling out!

REGEX formulas in Excel are now rolling out in the current channel of Excel, just in time for Christmas.

In this post I want to explain why most Excel users, not just hardcore geeks, should be interested in this.

In a nutshell, REGEX formulas in Excel mean that you suddenly have thousands more options for data cleansing and extraction than you did before.

What are REGEX formulas?

REGEX is short for Regular Expressions. Regular Expressions have existed in one form or another as long as we’ve had computer programming. They give us ways to match patterns in a string of text.

There are currently three REGEX formulas in Excel.

  • REGEXEXTRACT: Extracts a pattern of data from a string of text.
  • REGEXREPLACE: Searches for a regex pattern within the text and replaces it with different text.
  • REGEXTEST: Checks if any part of the text matches a regex pattern.

The most useful of these Excel functions for finance teams is likely to be REGEXEXTRACT, so I’m using that as an example in this post. However, the others work in a similar way.

Why are REGEX formulas in Excel so useful?

Because Regular Expressions have been around for so long, there is a huge amount of documentation available. That is – ready-made code you can steal. So as long as there is some kind of pattern to your data, you can probably find someone who has already solved your problem and borrow their solution.

Let’s look at an example.

In the image above we have a list of manual adjustments to the ledger. We want to extract all the Purchase Order references from the data in column A. All PO references are eight digits long.

We wouldn’t be able to solve this in Excel before REGEX formulas (at least not without using Power Query or VBA). Traditional Excel text formulas like FIND and MID don’t help in this case because there is nothing to anchor the start of the eight digit string to. The data doesn’t always have “PO” in front.

However, there is a consistent 8 digit pattern.

And with about ten minutes of internet searching and testing, I had the solution I wanted:

=REGEXEXTRACT(A4,”(?<!\d)\d{8}(?!\d)”)

(Cell reference A4 is where the original text is. The bit in bold is the bit I stole off someone else)

Using other people’s solutions

Now, you have to be a bit careful if you are new to Regular Expressions. The reason the above took me ten minutes as opposed to thirty seconds is because I did initially find and try this as a solution:

=REGEXEXTRACT(A4,”(\d{8})”)

However, on testing it, I found that it will include digits that are more than 8 digits long, which isn’t what I wanted.

So you definitely need to test what you find. Generative AI tools such as ChatGPT are both a blessing and curse with REGEX. A blessing because it is quite good at helping you get to the right solution. A curse because people are clearly using these tools to mass generate thousands of web pages, so there’s a lot more dross out there.

(I would always look at results from sites such as Stack Overflow or even Reddit where actual users have contributed.)

regex101.com is a useful resource for testing and learning more about REGEX.

I am in the process of putting together a little workbook of specific Regular Expressions that I think will be useful for finance teams. It will be available to my newsletter subscribers and to existing and new clients sometime in spring 2025.





If you want a quarterly newsletter summarising latest developments in Excel and other things I've found interesting or useful, please sign up here. I won't use your data for anything else.

Leave a Reply

Your email address will not be published. Required fields are marked *