To: OneWingedShark
I used Regex with Grep to recover item number sequencing for a 3,000 page catalog. With it I could recover the item sequencing (after it was edited for print) from the actual .pdf that I sent to the printer. I also used it to extract the page number for each item in the print document.
In order to make that work I unzipped the .pdf with PDFtk and used an editor to remover the proportional spacing syntax for the numeral “1” followed by “1” (numeral one). Then I used keying code for page number and item number. It worked beautifully. But it only worked because the text data was imported into Quark with Xdata or Xtags. Like extruding the document from the data.
In some cases it will also work with XML for similar purposes.
Grep (which handles Regex) in the right hands is a very powerful tool for manipulating text data.
10 posted on
10/28/2011 8:04:06 PM PDT by
Texas Fossil
(Government, even in its best state is but a necessary evil; in its worst state an intolerable one)
To: Texas Fossil
>Grep (which handles Regex) in the right hands is a very powerful tool for manipulating text data. I won't argue against that; I've heard some amazing things done with it... but in my own experience the regex in use is always breaking/broken. I'm slightly more inclined to learn
SNOBOL rather than gain a [working-]mastery of regex as 1) it will be another [programming] language to learn, and 2) it seems, just from an overview, to be more *useful* for pattern-matching. (And, as a plus, apparently it's been ported over to GNAT's Ada compiler as a compiler-specific package.)
11 posted on
10/28/2011 8:18:12 PM PDT by
OneWingedShark
(Q: Why am I here? A: To do Justly, to love mercy, and to walk humbly with my God.)
To: Texas Fossil
Grep (which handles Regex) in the right hands is a very powerful tool for manipulating text data. For non-trivial text manipulation tasks, you are MUCH better off with something like PERL.
15 posted on
10/29/2011 9:27:18 AM PDT by
PapaBear3625
(When you've only heard lies your entire life, the truth sounds insane.)
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson