Regex is, quite simply, the wrong tool to use in most cases.
Like you said, if your data isn’t structured in a particular way it doesn’t work; but more than that it’s VERY easy to leave out some case when writing one. (I’ve had to do a few of those corrections, not fun; and in my experience if it’s even moderately complicated it’s going to change.)
I don’t really trust regex for anything more complicated than a) removal of certain characters, and b) *simple* transforms, like perhaps changing cases.
In order to make that work I unzipped the .pdf with PDFtk and used an editor to remover the proportional spacing syntax for the numeral “1” followed by “1” (numeral one). Then I used keying code for page number and item number. It worked beautifully. But it only worked because the text data was imported into Quark with Xdata or Xtags. Like extruding the document from the data.
In some cases it will also work with XML for similar purposes.
Grep (which handles Regex) in the right hands is a very powerful tool for manipulating text data.