Scrape Wayback for AU domain name and first archived date

已取消 已发布的 6 年前 货到付款
已取消 货到付款

Unless one of the Freelancer community knows different, there doesn't seem to be a way of querying the AU TLD whois records for the creation date of Australian domain names, even with DIG or nay other commandline tools. My imperfect solution is to harvest the earliest Wayback archive entry .

Image 1 shows the two data elements I need to extract from the zonefile csv I have : (1) the domain name (2) the date the domain name was first archived in Wayback.

Using a script I found in the Wayback APIs I have built a [clunky] batch script [[login to view URL], attached] that captures data into a series of files that I batch rename to a csv [[login to view URL] example attached].

Each capture from [login to view URL] produces a file with the oldest archive date on line 1. The domain name is obvious from each line, the date is in the YYYY:MM:DD:HH:MM:SS format.

It will need a chunk of regex written into the script to filter off the first line of each successful data capture, clean up time data element to readable format (eg 19981202014938 becomes 02/12/1998) and appends the domain name and date to an external csv file.

Some of the domains have no entry in Wayback but the script will still need to write the URL to the csv with a 'nul' value as the date element. so I can see which have no Wayback records.

The script will preferably draw the urls from an external text file.

The successful bid can use the URLs listed in the '[login to view URL]' file attached as a test list for the script. I dont need the scrape, just the script (there are 2.1 million domain names to query).

Any other questions, DM me.

Some resources:
https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md#basic-usage

https://blog.archive.org/developers/
https://blog.archive.org/2013/07/04/metadata-api/
https://archive.org/help/json.php

Autobids will be deleted if the proposal is not read, and acknowledged within 12 hours...

Reissuing this project as it should have been listed in AUD not USD. check for new project just listed.

数据输入 Excel JavaScript JSON shell脚本

项目ID: #14356249

关于项目

6个方案 远程项目 活跃的6 年前

有6名威客正在参与此工作的竞标,均价$155/小时

Venkat2011sri

Hi, I am working as a freelancer since 12 years and completed 1500 projects. I assure you 100% accuracy in the delivered work. I look forward to work with you. Relevant Skills and Experience Data Extraction Proposed 更多

$166 USD 在3天内
(184条评论)
6.8
ChinmoySarker

Hi, Being attracted with your declaration of the program, I feel tempted to have the chance to make your work complete carefully and sincerely. I would like at present to have your kind mind and as soon as possible. 更多

$100 USD 在3天内
(28条评论)
4.6
vietdevteam

I have read your project. I'm sure i can help you to do it. I have completed many projects similar to this project. Relevant Skills and Experience I am expert in web scraping. I have created many scraping tools. I h 更多

$150USD 在1天里
(8条评论)
3.9
huongth

Hi. I am an expert in VBA, VBScript, Visual Basic, C#, F#, C, C++, ASM, Delphi, Java, iMacros, Flash, ASP, ASP.NET, Access, MySQL, MSSQL, QuickBooks, Oracle. I can create auto scripts to scrape websites, auto click, fo 更多

$150 USD 在3天内
(16条评论)
3.7