Local Search Engine
Required a search engine script written in PHP and making use of MySQL database. Code to be well documented. Script requirements are:
Able to index an entire site by reading files from a specified directory and all sub directories excluding files / directories named in an ASCII stopfile list.
Indexing to be a simple process and for sites of up to 1,000 pages should not take a significant amount of time (say sub 20 seconds).
Index must index all words in each file but exclude stop words listed in an ASCII stopword list.
Indexing must include the option to exclude words within specified html tags, for example Exclude “a href” must exclude any text between “a…” and “/a” tags so menu titles are not indexed.
Indexing must give the option of excluding or including meta tag data or just data within the “body” “/body” section of a page.
Index should hold dates of files indexed so that a partial reindex can take place to include only files updated/added since. This should include removing references to files which have been removed from the site.
Indexing script should indicate progress and confirm pages which have been indexed. There is no need for fancy output here, a pre-defined CSS file will control all formatting.
Searching
Searching across the index will be by supplied search string. Searching must be possible for single word, multiple words and phrases. For example
Search String: “Fred” should find all instances of “Fred” “*Fred” “Fred*” or “*Fred*” in the index.
Search String “Fred Bloggs” should find all instances of Fred Blogs where both words are together (the search string will have quotes in this instance).
Search String Fred Bloggs without quotes will find instances of Fred and Blogs on a page, but not necessarily together. Such finds will be ranked lower than an exact match.
Input string must be able to handle phrases of up to 5 words separated by a space or punctuation character. Search will also handle numerics and characters “-“ “_” “.” “+”
Results from searches will be ranked on the following rules:
Highest Priority – Words/Phrase found in “title” “/title” tag
2nd Highest Priority – Words/Phrase found in “hx” heading tag H1, then H2 etc.
3rd Priority – Words / phrase found on page as exact match
4th Priority – Words or part phrase found on page but not together
Output from search routine should be formatted using CSS styles which will be supplied. Search output will be limited if required to x hits and may start from hit number Y. For example display hits from 20-30
Installation of script into site must be simple and supported by good documentation. It is intended that the search engine will be added to a number of sites of varying size from 50 pages to 1,000 pages.
Requirement requires a reasonably fast turnaround of code (but keeping to the specified spec), please indicate time to delivering completed project.
We will do our best to make your project ASAP and with the highest quality!
KRONIKS Ltd - we are the team of professional programmers and designers.
Visit our site please -- [login to view URL] -- to know better about us. Feel free to contact us.