The task is to write code, that I can run on my computer, that will convert .txt files to .csv files. For both file types the text encoding should be as UTF8.
Overall Goal:
I explain in detail further below, but this is the summary of the desired output.
Convert each TXT (UTF8) file to a CSV (UTF8), with:
• 18 columns containing the data from the metadata section
• one column containing the count of paragraph within the document
• one column containing the text
• each paragraph from the text should have its own row
• the breaks between paragraphs should occur when there are two or more consecutive line/paragraph breaks in the text, or before <quotation> , <interviewer> , <other> , or after </quotation , </interviewer> , </other> .
The Body of Text
In the CSV, I want this body of text to be entered into a column with the name "Text". And, I want each paragraph in the body of text to have a separate row.
There will be many cases where the conversion of scanned pdfs to TXT files resulted in line breaks or paragraph breaks where they shouldn't be there. So, I instructed the research assistants to put two or more line/paragraph breaks between each paragraph. So, your code should be written to treat a single line/paragraph break as though it shouldn't be there. And it should treat two or more consecutive new line/paragraph breaks as indicating a real break between paragraphs.
The research assistants wrote codes to distinguish different speakers. In your output CSV, these should case a break in the paragraphing. Always break up the paragraphing immediately BEFORE the following:
• <quotation>
• <interviewer>
• <other>
Always break up the paragraphing immediately AFTER the following:
• </quotation>
• </interviewer>
• </other>
The Metadata
The metadata section starts with <code> and ends with </code> .
I want each variable in the metadata section to have its own column. In each of these columns, within a given document, the value entered will repeat from row to row. (This is because the metadata will be the same for every paragraph in a given document.)
In the metadata, there are slots for 18 variables (so they should result in 18 columns), as follows:
1. Research Assistant Name [Family, Given]:
2. Title of Source Book:
3. Publisher:
4. Publisher's City:
5. Publication Date [yyyy-mm-dd]:
6. Title of Source Website:
7. URL:
8. Date Retrieved on [yyyy-mm-dd]:
9. Leader’s Name [Family, Given]:
10. Country:
11. Political Office:
12. Political Party:
13. Title:
14. Date Delivered [yyyy-mm-dd]:
15. Type of Text:
16. Scope of Audience:
17. Circle of Audience:
18. MISCELLANEOUS
The values for each variable are entered after the colon in the variable's name. The exception to this is MISCELLANEOUS, which doesn't have a colon, but in this case the value entered is everything between the variable name (MISCELLANEOUS) and </code> .
There are headings for the variables. These are to help the research assistants who entered the data, and you don't need to use them. But your code should be written so that it does not pick up these headings as part of the values of the variable above them. These headings are as follows:
• PUBLICATION INFORMATION
• LEADER AND COUNTRY INFORMATION
• SPEECH/INTERVIEW INFORMATION
I have instructed my research assistants to not include colons in the values that they enter, and to not include line breaks or new paragraphs in the values that they enter (with the exception of the MISCELLANEOUS section, which does have line breaks and new paragraphs).
Count of the Paragraphs Within the Document
I also want a column that gives the count of the paragraph within the document, e.g., the first paragraph in the document is 1, the second in the document is 2, etc., and when we move on to the next document, it starts from 1 again.
Attached: template of the entry of the metadata, 4 examples of input TXT, and example of desired output CSV.
Hi there.
I am expert in this fields
I know which skill are necessary for you.
You can confirm about it via my previous client's reviews.
Hope to contact you.
First, thank you for the excellent project description!
I can provide you a simple-to-use Python script that will convert your TXT files into CSV files like you want. I can complete it in 1-2 days for just $200 CAD. Could you give me a dozen of files for testing?
Roman
*** Python Expert for your project : TXT to CSV ***
I read your project description very carefully.
I have a deep understanding and experience in the areas of python that you mentioned.
I've previously worked on so many projects for other employers.
Here is my profile URL: https://www.freelancer.com/u/Fazeennazar
Check out my past reviews and skills.
So, I would like to go through more specific discussions with you to provide successful results.
Thank you, Mohamed F.
Hello, Client!
I have read your project description very carefully and feel a great interest in your project.
As a senior c++/python developer with 8 years of experience, I can handle your project perfectly.
I can start immediately and can bring you full time service.
Let’s discuss your project more detail in chatting box. Feel free to contact me.
Thanks & Regards
~~~~~~~~~~~~~~~~~ Satisfied Clients Are All Of My Business! ~~~~~~~~~~~~~~~~~~
"Python & Excel guru is here!"
Hello, I just went through your job post carefully, and it piqued my interest.
Your description of the work responsibility and the role closely match my skills.
As an experienced senior python developer, I can deliver perfect results on time.
If you want, I will share my work experiences. Can I have a chat with you to make some points clear?
I am looking forward to your message.
Thanks.
-------------------HERE!!!-----------------Hello there. I am Roman R now you are looking for. I am glade to take this task. I am Python Expert. I've worked for over 7 years with this subject. And I can make everything using Python!!!! I have checked your project description carefully and I think that I can help you to complete this project 100% perfectly sure to satisfy your requirement. I'd like to have a brief chat, call to discuss more details about your project soon.
Looking forward to working with you together on this project.
Thanks and Regards
⚡️⚡️ I checked your project detail and my skill are fitted in your project.⚡️⚡️
Hi, I have been working with Python and Data handling development for over 5+ years.
I have had a experience working with similar project and can show the working results.
I will do my best for good result you wanted.
Please contact me for good result.
Thanks.
Dear client!
I have seen your job posting and thought I am very fitted to this job.
I am full stack web developer and have rich experience in Python programming.
So I am sure I can convert txt files to csv files perfectly.
I sincerely hope to work with you.
Best regards.
Hello,
I'm a python developer
with +2 years of experience in programming with Python
I will create script to convert .txt files to .csv files with utf-8 encoding with all of the desired output.
I will do this script for $70 CAD
looking forward to working with you
Best regards,
Mohamed
Greetings!
Thanks for your job posting.
i read your project details and i understood.
I can help you with your project at a high level!
I have rich experience in Python developing.
My skills and experience is a great match for your needs. Looking forward to hearing from you.
Best regards.
thanks.
Your requirements are very clear and I'm ready to make a simple console script that does the conversion for you aiming to deliver it well before end of today, let me know if you need it urgently because I'm actually free to take on the project just about anytime.
Hello, I am a Python, C and C++ programmer, I took notes of requirements of the projects, I will provide you the Python program in a relative short time. Feel free to contact me.
Regards.
Hello
How are you?
i have read your project description in detail and I am sure it is not difficult for me so I am bidding now.
So as you can see my freelancer profile, I am an expert in Full-stack development, Blockchain and High Coding skills.
I am proficient in Python, PHP, JavaScript, Ruby on Rails, and spreadsheets.
While reading the description, I understood that to be a great candidate for this position, it is important to create and follow best practices and standard styleguide, write clean and maintainable code.
I highly prioritize performance and maintainability of the project I structure and the code I write.
So that's why all of previous projects are easily maintainable, scalable and modular.
I am very honest and patient, have good communication skills.
My goal is time and quality, to best satisfy customer needs. Let's begin the glorious journey of working with you.
And also I am ready to start to work right now and I can work with a full-time role in your time zone.
Give me an opportunity to discuss the project in detail.
Please feel free to contact me anytime.
Looking forward to work with you now.
Best Regards.
hello there, I have gone through this project description that you are looking for a someone that will convert .txt files to .csv . I have an experience in this field with over 3 years. I have the required qualification. Get me in the chat box for further discussion in order to start the project. Regards, Daniyal
Hello, i am pursuing phd and in this time i had a lot of experience dealing with data as well as converting text to csv and vice versa. Do let me know.