A metric crapload. Parts manuals, brochures, tons of them. The pdfs were just scans done a few years back, so the content is basically page after page of images.
I found one program that did break out each page into an individual jpg and then I could OCR it. That worked, but it's going to be too much work reassembling the pages in html.