Author Topic: WinXp pdf to html converter?  (Read 4090 times)

rcjordan

  • I'm consulting the authorities on the subject
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 16345
  • Debbie says...
    • View Profile
WinXp pdf to html converter?
« on: February 09, 2013, 12:16:11 PM »
I have some pdf files that would make good longtail fodder. Anybody used a converter that wasn't crap? It doesn't have to be freeware.

IrishWonder

  • Inner Core
  • Hero Member
  • *
  • Posts: 561
    • View Profile
    • IrishWonder's SEO Consulting
Re: WinXp pdf to html converter?
« Reply #1 on: February 09, 2013, 03:23:57 PM »
Mechanical Turk?

rcjordan

  • I'm consulting the authorities on the subject
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 16345
  • Debbie says...
    • View Profile
Re: WinXp pdf to html converter?
« Reply #2 on: February 09, 2013, 07:16:56 PM »
That would be my last resort.

IrishWonder

  • Inner Core
  • Hero Member
  • *
  • Posts: 561
    • View Profile
    • IrishWonder's SEO Consulting
Re: WinXp pdf to html converter?
« Reply #3 on: February 09, 2013, 07:23:13 PM »
Well if nothing else works...

Rumbas

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2106
  • Viking Wrath
    • MSN Messenger - rasmussoerensen@hotmail.com
    • AOL Instant Messenger - seorasmus
    • View Profile
Re: WinXp pdf to html converter?
« Reply #4 on: February 09, 2013, 09:18:51 PM »
How many files? A lot I guess?

rcjordan

  • I'm consulting the authorities on the subject
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 16345
  • Debbie says...
    • View Profile
Re: WinXp pdf to html converter?
« Reply #5 on: February 11, 2013, 05:10:15 AM »
A metric crapload. Parts manuals, brochures, tons of them.  The pdfs were just scans done a few years back, so the content is basically page after page of images.

I found one program that did break out each page into an individual jpg and then I could OCR it. That worked, but it's going to be too much work reassembling the pages in html.

JasonD

  • Inner Core
  • Hero Member
  • *
  • Posts: 1420
  • Look at THAT!!!!
    • AOL Instant Messenger - JasonDDuke
    • View Profile
    • Domain Names
    • Email
Re: WinXp pdf to html converter?
« Reply #6 on: February 11, 2013, 05:18:51 PM »
http://pdftohtml.sourceforge.net/

I'm not so sure the text will end up as true html rather than simply images without OCR though

littleman

  • Administrator
  • Hero Member
  • *****
  • Posts: 6552
    • View Profile
Re: WinXp pdf to html converter?
« Reply #7 on: February 11, 2013, 05:27:51 PM »
You could just breakdown and get yourself a Linux box RC.

rcjordan

  • I'm consulting the authorities on the subject
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 16345
  • Debbie says...
    • View Profile
Re: WinXp pdf to html converter?
« Reply #8 on: February 11, 2013, 07:14:39 PM »
>not so sure the text will end up as true html

Yeah, I'd seen that linux script and wondered the same thing.  I'm thinking that it's talking about more recent types of pdf which were converted from Word docs or rtf.

>linux box

I'm sure I have one in a pile around here somewhere, hhh.

Rumbas

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2106
  • Viking Wrath
    • MSN Messenger - rasmussoerensen@hotmail.com
    • AOL Instant Messenger - seorasmus
    • View Profile
Re: WinXp pdf to html converter?
« Reply #9 on: February 13, 2013, 09:39:47 PM »
Fiverr.com?

rcjordan

  • I'm consulting the authorities on the subject
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 16345
  • Debbie says...
    • View Profile
Re: WinXp pdf to html converter?
« Reply #10 on: February 14, 2013, 02:36:03 AM »
>5r

looking more and more like it

i did try vsisoft today, it did spit out all the images numbered by the page it extracted them from. That could be handy. But the job is still a massive PITA.  I'll probably trash the idea.

JasonD

  • Inner Core
  • Hero Member
  • *
  • Posts: 1420
  • Look at THAT!!!!
    • AOL Instant Messenger - JasonDDuke
    • View Profile
    • Domain Names
    • Email
Re: WinXp pdf to html converter?
« Reply #11 on: February 14, 2013, 09:42:52 PM »
https://crocodoc.com/

Just read about it, then played with it - Looks very interesting

werty

  • Inner Core
  • Full Member
  • *
  • Posts: 104
    • View Profile
Re: WinXp pdf to html converter?
« Reply #12 on: February 19, 2013, 03:16:04 AM »
You also might be able to use: http://finereader.abbyy.com/

That came w/ my scanner and turns all the PDFs into searchable pdfs and does the OCR and what not. Then you could copy and paste.

Same with evernote. Not sure if the free one will do ocr or not, but you could then cut and paste.

This also seems like it may do a pretty nice job: http://www.verypdf.com/app/pdf-to-html-converter/index.html