Replies

To: Texas Fossil

Regex is, quite simply, the wrong tool to use in most cases.

Like you said, if your data isn’t structured in a particular way it doesn’t work; but more than that it’s VERY easy to leave out some case when writing one. (I’ve had to do a few of those corrections, not fun; and in my experience if it’s even moderately complicated it’s going to change.)

I don’t really trust regex for anything more complicated than a) removal of certain characters, and b) *simple* transforms, like perhaps changing cases.

9 posted on 10/28/2011 6:34:53 PM PDT by OneWingedShark (Q: Why am I here? A: To do Justly, to love mercy, and to walk humbly with my God.)

To: OneWingedShark

I used Regex with Grep to recover item number sequencing for a 3,000 page catalog. With it I could recover the item sequencing (after it was edited for print) from the actual .pdf that I sent to the printer. I also used it to extract the page number for each item in the print document.

In order to make that work I unzipped the .pdf with PDFtk and used an editor to remover the proportional spacing syntax for the numeral “1” followed by “1” (numeral one). Then I used keying code for page number and item number. It worked beautifully. But it only worked because the text data was imported into Quark with Xdata or Xtags. Like extruding the document from the data.

In some cases it will also work with XML for similar purposes.

Grep (which handles Regex) in the right hands is a very powerful tool for manipulating text data.

10 posted on 10/28/2011 8:04:06 PM PDT by Texas Fossil (Government, even in its best state is but a necessary evil; in its worst state an intolerable one)

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794