Search

my categories

Blogroll

replacing many grouped regular expression matches in powershell

January 11th, 2010 by Karl

PowerShell has some really good build in language features for dealing with regular expressions -Replace , -Match . Not to mention regex filtering is available to many cmdlets etc, and if thats not enough you can just invoke the powershell Regex classes.

However when dealing with more than the first match.. or where the replace was not just a simple string replace.There are many cases where a simple string substitution was not enough and i really needed to apply some logic at each match to work out what it was to replace it with.

The problem was it took me about 5 or 6 lines of code each time, and broke my “keep everything in the pipeline” goal as well as my workflow. So here is a function to do this. In reality it could be a lot more robust, and could be part of a whole family of such functions carefully sculpted for your regex needs.

More importantly than regex however, this is a good example of refactoring logic in powershell into a one liner, and how using using scriptblock in a “closure” manner where the end user passes in a scriptblock they wrote and gets some automatic variables from the scope of your function. Lets look at the function and an example.

function replace-regexgroup ([regex]$regex, [string]$text ,[scriptblock] $replaceexpression)
{
$regex.Replace($text,{
$thematch = $args[0]
$groupnames = $regex.GetGroupNames()
for ($count = 0; $count -lt ( $thematch.groups.count) ; $count++)
{
set-variable -name $($groupnames[$count]) -visibility private -value $($thematch.groups[$count] )
}
if ($replaceexpression -ne $Null) { &$replaceexpression}
} )
}

So the user passes in the regular expression, the text that they want to do the replace on and a scriptblock which is the expression that will be run for each match, and the result of that expression will be what the each match is replaced with. You can see how i use set-variable to set a variable in the of my function based on the name of each regex match (i really should also do a $1 , $2 , $3 as many people are used to in jquery and such. ) Then i call the scriptblock with &$replaceexpression . I haven’t decided in scenarios like this whether to invoke such scriptblocks with & which will create a child scope for them to run in or rather execute then in my scope with the dotsource operator “.”.

So how do we use this. Well in the first scenario when i made this, i was exporting wiki contents from a desktop app that used HTML stored in a database. I wanted a backup of this content in plain html and it had links such as shown below

$example = @"
<P><a href="wiki://284_636">links to test page 2</a></P>
<P><a href="wiki://109_49">
"@

and i needed to find all those Wiki links and needed to know and transform it based on the two numbers on either side on the _ (in the first case 284 and 636) which i did with the following line of code.

replace-regexgroup 'wiki://(?<wholething>(?<folder>\d+)_(?<page>\d+))' $example { "$folder/$page/index.html" }

You can see that the regex named different groups and that my scriptblock is simply one string with variable expandsion inside using the automagically created variables $folder and $page based on the regex groups.

Here is a link to the script on PoshCode .

I hope that this is more than just a good starting point for a regular expression function but something that helps teach the value of scriptblocks and how they can be used to make reusable code that simplifies your workflow.

-Karl

Posted in PS Refactored, Powershell | 4 Comments »

4 Responses

  1. stej Says:

    Just a note for readers: it is possible to do it in this way:

    $example -replace ‘wiki://(?(?\d+)_(?\d+))’,'${folder}/${page}/index.html’

    However, it’s a nice demonstration how to use variables and scriptblocks.

  2. Jason Archer Says:

    A nice simple example of using user provided script blocks. Thanks.

    BTW, the particular task you were doing could be accomplished without custom code:

    $example -replace ‘wiki://(\d+)_(\d+)’,'$1/$2/index.html’

    But there are some cases where the regex could be simplified by using code to assist in the processing.

  3. Karl Says:

    lol, i can’t believe i didn’t know that. I had thought that -replace only did the first item. I’ll have to look at other things and remember while i abandoned the built in -match and -replace and went to dotnet methods. There was a reason but i must have remembered it falsely. Thanks guys

  4. stej Says:

    (some characters were removed so I’ll try the oneliner again:

    $example -replace ‘wiki://(?(?\d+)_(?\d+))’,'${folder}/${page}/index.html’

    $example -replace ‘wiki://(?<wholething>(?<folder>\d+)_(?<page>\d+))’,'${folder}/${page}/index.html’

    One of them should be correct :) )

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.