Peter 'Happy' Thomas (happypete) wrote,
Peter 'Happy' Thomas

I hate C# and the .NET Framemork Regex namespace right now

I figured it out. Word paragraphs render into a string buffer as \r, not \n, and the Regex engine only consider $ or ^ as end-of-line or beginning of line based on \n.
Okay, so say I've got a Regex like so:

const string _regexAreaBlock = @"^\s*(?\d+\.0)\s*(?.*?)$(?s:(?.*?))((?=^\s*\d+.0)|\Z)";

and a function call like so:

MatchCollection matches = Regex.Matches(buff, _regexAreaBlock, RegexOptions.Multiline);

Where buff looks like:

[some text]

1.0 Mission Capabilities/Requirements Assessment Area

[lots of text that I'm going to do something with in a minute]

2.0 Resource Assessment Area

[more text, that I'm going to do something with after I get done with the first block of text]

3.0 [...and so on]

Why is it not matching ANY text, let alone not neatly matching each block and capturing the number, title, and text in the appropriately named groups?

Oh, if it helps you:

const string _regexAreaBlock = @"(\d\.0)\b((?![\r\n][\t ]*\d\.0).)+";

without the Multiline option seems to do approximately the right thing, under most circumstances...

  • Farewell

    photognome died of heart failure yesterday. I don't have words right now.

  • My tweets

    Tue, 11:00: Pulling a full 250 KW and over 1,000 miles per hour charge rate at the new Reston teslamotors #SuperCharger. @ Wieh……

  • My tweets

    Fri, 19:57: Mixed meat meatballs, sauteed vegetables, loaded mashed potatoes with da_valentine and friends @ Corolla, North Car……

  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.