Help Optimizing Search @ Work.
We have to many pre-press documents at work so I need help figuring out how to organize them better.
We have thousands of PDF and PageMaker documents at work since half of Adkins is a printing company. I need to build a database with the content of our fileserver. I’m just not quite sure how to get the content of all the PDF’s and PM Docs. PDF’s are relatively easy to rip good info from but the PageMaker files seem impossible. I know they have text in them & can be read somehow but not quite sure how. Could someone point me in the right direction? Note I did find software you could run on a windows computer to batch convert the PM files to PDF, which I could use, but it’d be quite inefficient since the files are on a BSD server & I’d have to have the PM originals anyway.
Once I can get the text in a database I can search through it and kind of ‘test’ ways to organize them so they are easier to use. Or I suppose if different ways of organizing the files worked for different people in Adkins I could just not care about organizing the files on the file system itself & just organize it out of the database to view so everyone’s happy.
I'm dumb. I poked ruby n' a solution came in a few seconds (screenshot).
– Download PM2Text — I'm not sure what good it'll do anyone else but here is a little ruby script to rip the text out of a PageMaker file. I'll update the download once in a while if I remember. It still doesn't rip font names or textual garble out.
Posted in Work