Develop resume parser for a specialized type of resumes
$30-250 USD
进行中
已发布将近 9 年前
$30-250 USD
货到付款
I have thousands of resumes to scan that are in PDF format. I need to take those resumes and convert them to XML format. All resumes follow a similar format and are of the same type of candidate. All are in English.
I have specific needs for the resume parsing. Usually, a resume parser focuses on work experience and focuses little on related areas such as academic awards and hobbies. The resume parser I need is one that focuses on things that a normal HR resume parser will not focus on - I need it to focus on the person's hobbies, academic qualifications, guess the person's age, guess the person's gender, etc. Work experience is still important but not as important as the other information.
I have attached sample files from publicly available resumes that resemble the type of resumes that need to be parsed to give you a better idea of what we need to do.
Further details will be provided upon request.
I've done a lot of work with Python and parsing data. I did some research and found the best/most reliable way to grab the text from the pdf is to use the xpdf package which includes a binary which does a pdf to txt conversion. Then all that remains is to parse the text into python and find a way to guess the information you want.
For age, I think using the graduation years from school would be a good starting point, with tweaking based on other factors such as vocabulary used, etc.
For gender, use the degree type/work experience and we can use probabilities to determine the likely gender.
Any other classification can also be done once all the text is in the Python script.
Hello. How are u
I saw your description and sample pdfs.
I think that main point is to extract text from pdf .
and I have convert to XML Format.
I can complete well.
I want to discuss with u,
Please contact me.
I'll wait your good reply.
Bye Huang.
I am student pursuing my degree and have more free time to work and also working on a project based on python. Familiar with regular expressions module in python.