Author Topic: WinXp pdf to html converter? (Read 4090 times)

rcjordan · « **on:** February 09, 2013, 12:16:11 PM »

I have some pdf files that would make good longtail fodder. Anybody used a converter that wasn't crap? It doesn't have to be freeware.

IrishWonder · « **Reply #1 on:** February 09, 2013, 03:23:57 PM »

Mechanical Turk?

rcjordan · « **Reply #2 on:** February 09, 2013, 07:16:56 PM »

That would be my last resort.

IrishWonder · « **Reply #3 on:** February 09, 2013, 07:23:13 PM »

Well if nothing else works...

Rumbas · « **Reply #4 on:** February 09, 2013, 09:18:51 PM »

How many files? A lot I guess?

rcjordan · « **Reply #5 on:** February 11, 2013, 05:10:15 AM »

A metric crapload. Parts manuals, brochures, tons of them. The pdfs were just scans done a few years back, so the content is basically page after page of images.

I found one program that did break out each page into an individual jpg and then I could OCR it. That worked, but it's going to be too much work reassembling the pages in html.

JasonD · « **Reply #6 on:** February 11, 2013, 05:18:51 PM »

http://pdftohtml.sourceforge.net/

I'm not so sure the text will end up as true html rather than simply images without OCR though

littleman · « **Reply #7 on:** February 11, 2013, 05:27:51 PM »

You could just breakdown and get yourself a Linux box RC.

rcjordan · « **Reply #8 on:** February 11, 2013, 07:14:39 PM »

>not so sure the text will end up as true html

Yeah, I'd seen that linux script and wondered the same thing. I'm thinking that it's talking about more recent types of pdf which were converted from Word docs or rtf.

>linux box

I'm sure I have one in a pile around here somewhere, hhh.

Rumbas · « **Reply #9 on:** February 13, 2013, 09:39:47 PM »

Fiverr.com?

rcjordan · « **Reply #10 on:** February 14, 2013, 02:36:03 AM »

>5r

looking more and more like it

i did try vsisoft today, it did spit out all the images numbered by the page it extracted them from. That could be handy. But the job is still a massive PITA. I'll probably trash the idea.

JasonD · « **Reply #11 on:** February 14, 2013, 09:42:52 PM »

https://crocodoc.com/

Just read about it, then played with it - Looks very interesting

werty · « **Reply #12 on:** February 19, 2013, 03:16:04 AM »

You also might be able to use: http://finereader.abbyy.com/

That came w/ my scanner and turns all the PDFs into searchable pdfs and does the OCR and what not. Then you could copy and paste.

Same with evernote. Not sure if the free one will do ocr or not, but you could then cut and paste.

This also seems like it may do a pretty nice job: http://www.verypdf.com/app/pdf-to-html-converter/index.html

The Core

News:

Author Topic: WinXp pdf to html converter? (Read 4090 times)

rcjordan

WinXp pdf to html converter?

IrishWonder

Re: WinXp pdf to html converter?

rcjordan

Re: WinXp pdf to html converter?

IrishWonder

Re: WinXp pdf to html converter?

Rumbas

Re: WinXp pdf to html converter?

rcjordan

Re: WinXp pdf to html converter?

JasonD

Re: WinXp pdf to html converter?

littleman

Re: WinXp pdf to html converter?

rcjordan

Re: WinXp pdf to html converter?

Rumbas

Re: WinXp pdf to html converter?

rcjordan

Re: WinXp pdf to html converter?

JasonD

Re: WinXp pdf to html converter?

werty

Re: WinXp pdf to html converter?