As a market researcher with a degree in sociology, empirical social research, and statistics from the LMU Munich/Germany, I have 12+ years of survey research expertise. Previously, I worked at KANTAR, a top data science and insight company based in London.
Automated preparation, cleaning, processing, statistical analysis, and visualisation of complex data from heterogeneous data sources (databases, flatfiles, MS Excel, MS Access, APIs, PDF) and of varying data quality are amongst my regular tasks.
To manage this, I develop scripts and reusable programs (awk, bash, SQL, python) to streamline and automate the entire process from raw data to the delivery file.
To scan extract all information from the 300 pdf-documents you provide, I will use a well proven programm, that can easily extract the information from pdf files and convert it to text files. I will then automatically process the text-files and extract the desired information into structured CSV-Files or Excel-Files (both is possible). Once the process is setup, the extraction should work very fast and reliable, is less prone to error than manual data extraction and very competitive in price. The structure of the desired files would have to be agreed with you.
Images from pdf-files can also be extracted from and automatically put into Excel-Files, but this would be billed additionally.