PBS Teachers is the web site for PBS Education. The site is a static site with hard coded HTML pages.
We are interested in extracting the content from a set of pages located at? <[login to view URL]>? and capturing it in a data format.
Essentially, for the project, we need a person to:
-? Write a? script that reads the pages and grabs the data -- theme title, url, activities text, etc...
- Ports the data into some asci-style format -- comma delim, tab delim -- that allows the data to be ported into a database.
Although the page code is consistent, the pages do not contain XML tags or other clear identifiers.
## Deliverables
Please note:
- This is not a software project, but it's the only category that fit. We need data extracted and delivered to our developers to integrate into a database. Simple extraction project.
- Need to move quickly to get this done. Within the next 2-3 weeks.