Find Jobs
Hire Freelancers

Python web scrapping/crawling in nofap.com

$30-250 USD

已关闭
已发布将近 6 年前

$30-250 USD

货到付款
I need to save a snapshot (all html files) of the website [login to view URL] It is an online forum that allows people to post and follow each other. I want to save the following information: 1. [login to view URL] saved as '[login to view URL]' 2. On the index page, there are 23 forums (Notice the Porn Addiction and Porn-Induced Sexual Dysfunctions are two forums when I count). I need all pages of all threads in each of the 23 forums to be saved. For example, the first forum is shown as "Rebooting - Porn Addiction Recovery". After clicking on it, it leads to [login to view URL] The ending number 2 in the previous link is an identifier. I want this page to be saved to "[login to view URL]". There are 583 pages of threads (posts) in this forum. You can save them to "[login to view URL]" all the way to "[login to view URL]". In each of these pages, there are 50 threads (a little more on the first page due to some information and announcement at the top). Each of the 50+ thread may contain multiple pages as well. I need all these pages of html files saved too. For example, the first post is "[login to view URL]". The ending number 88344 is also an identifier, I want them to be saved to "[login to view URL]" to "[login to view URL]" (5 pages of this posting thread). 3. I want all the user profile pages to be saved as well. The website ([login to view URL]) shows there are 156,726 members. You can actually enumerate all of them starting from 1 to 156726 using the following link(for user 1): [login to view URL] In this user profile page, I need html pages that show the 5 tabs "Profile Posts"(It may have multiple pages, all pages needed), "Recent Activity" ("Click on Show older items" at the bottom until the button disappears so that everything is captured), "Postings" (No need to find all since all postings are captured in the previous step), "Information", "Groups". Moreover, I want to know the user_id of the "Following" and "Followers". For example, user 1 is following 8 other users and followed by 826 users. I want 2 tables (csv or sqlite) to save the Following/Followers information, each with 2 columns. Following Table: user_id, following_user_id; Followers Table: user_id, follower_user_id. In the Following/Followers information, only 20 users are shown each page, you need to click on the more button multiple times to enumerate all users. Required: 1. The program should be able to finish running within 24 hours (Multithreading might be needed. For example, several threads can handle several forums, one thread can handle the user profile pages). The shorter the time, the better. Because I plan to scrape the websites on different days to see the change of users and posts. 2. Since I want to scrape this website in different days, it would be great to do some type of incremental scrapping. Running it the first time would save everything, but running it again would keep a "diff" type of files necessary to know what is deleted (user, user following relationship, threads). That would save a lot of hard disk space because I don't need to save duplicate html files that are already saved. 3. Python 3.5+ and other packages that you find necessary 4. The program should login to the forum before saving the html files. It is free to register. Login credentials can be provided upon requested. 5. The program will run on Linux Ubuntu 6. Clear comments in the code so that I can modify later 7. Object oriented design is preferred
项目 ID: 16684496

关于此项目

16提案
远程项目
活跃6 年前

想赚点钱吗?

在Freelancer上竞价的好处

设定您的预算和时间范围
为您的工作获得报酬
简要概述您的提案
免费注册和竞标工作
16威客以平均价$259 USD来参与此工作竞价
用户头像
Dear,Sir How are you? I am very interested in your project and am ready for starting your project for now. I have experienced in developing Python, Web Scraping. I will work very hard and best for you. Best Regards
$155 USD 在3天之内
5.0 (53条评论)
7.0
7.0
用户头像
Hi there, I just checked the project details and i'm very interested to discuss with you. I have great knowledge in web scraping and i use python. Feel free to pm so that we can discuss and share sample work! Regards. Sohan D.
$250 USD 在3天之内
5.0 (211条评论)
7.2
7.2
用户头像
A proposal has not yet been provided
$30 USD 在2天之内
5.0 (127条评论)
6.9
6.9
用户头像
Hello, I have the good knowledge of Python web scrapping/crawling in nofap.com. I have more than 5 years of experience in Python, Web Scraping . We have worked on several similar projects before! We have worked on 300+ Projects. Please check the profile reviews. I can deliver your job with in your deadline. Please ping me for more discussion. I can assure the 100% job satisfaction. Thanks,
$300 USD 在3天之内
4.9 (33条评论)
6.1
6.1
用户头像
Can do it with selenium/scrapy or beautifulsoup of python whatever you want.
$100 USD 在3天之内
4.9 (38条评论)
5.1
5.1
用户头像
Hello i suggest to implement the crawler in java to support any OS linux and windows the crawler will be multithread and gives as output a xls file or a db file as you want i invite you to discuss more over chat Thank you in advance
$150 USD 在3天之内
4.6 (26条评论)
4.8
4.8
用户头像
I have 6 year experience Freelancer,up work,Fiverr & 99design market place I have seen your project that i can to do easily because I have many experience to Graphic Design,Webdesign,Web Develop & programming .So I could create it for you as soon as that easy but how this have done ,,,,please discuses before start job,,,,
$155 USD 在3天之内
0.0 (0条评论)
0.0
0.0
用户头像
I am pretty much familiar with the task and do similar things frequently.
$222 USD 在5天之内
0.0 (0条评论)
0.0
0.0
用户头像
I checked all your requirements properly. Would be able to scrap the info. Much skilled in python. Rate: $18/hr Let us discuss and start. Sandeep
$250 USD 在5天之内
0.0 (0条评论)
0.0
0.0

关于客户

UNITED STATES的国旗
United States
0.0
0
会员自4月 12, 2018起

客户认证

谢谢!我们已通过电子邮件向您发送了索取免费积分的链接。
发送电子邮件时出现问题。请再试一次。
已注册用户 发布工作总数
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
加载预览
授予地理位置权限。
您的登录会话已过期而且您已经登出,请再次登录。