I have extensive experience in web-scraping and developing in Java and Python. Most of my projects have been deployed in Amazon Web Services (AWS), but am comfortable with the Microsoft Azure platform as well.
I have actually completed a project similar to this several months ago in Java:
My project scraped financial articles-- extracting the plain-text, and cataloging them. Since late 2013, the script has run on an Amazon Web Services EC2 instance continuously, cataloging ~1,000,000 articles. If you want, I can provide you with more details about this project.
For your project, I would use a similar approach. Using either Java/Python to scrape the page, extract the plain-text, scan for keywords, translate, store, and report. I would deploy via AWS, using an EC2 instance (cost ~$18 USD/month, or, the first whole year is free if you are a new user) + a NoSQL database. Depending on your needs, AWS can be quite inexpensive!
I can determine which specific solutions are the best fit after we discuss your project further.
Feel free to message me to discuss the project or my experience/capabilities.
Regards,
William