IIT Bombay and HP going Wolfram Alpha way?

by Mohit Jain | 2:59 AM in |

Hewlett Packard (HP) Labs and IIT Bombay are working together on developing a search engine which can provide relevant information on the searched queries, within a short period of time, reports The Business Standard

The Computer Science Department of the IIT-B is among the universities around the world to receive grants which the HP Labs had initiated last year.

The team along with Professor Soumen Chakrabarti at IIT-B used this grant to work on a new search engine which will take measures to trawl the web to provide relevant answers to queries.

The team has already created billions of annotation links between a 500 million web page corpus and millions of entities known to Wikipedia. The data is being made on 42 high-end HP servers with over 350 gigabytes of RAM and over 150 terabytes of disks, which are donated by Yahoo. HP Labs and Microsoft Research have provided additional research funding.

The initial results have shown exciting results, as Sayali Kulkarni, a student working on the project says, "The search for quantity queries get answered in 2-5 seconds." The search engine will even allow searching for entities like "how old is Feng Shui", and the number of AIDS affected people in the world, adds Prof Chakrabarti.

The search engine is designed to understand more queries and respond with information nuggets and tables, not just the links of the pages, making it different from the other search engines.

Queries like "length of the Nile River" or "maximum speed of a Mercedez Benz SLR McLaren" would be answered using encyclopedia sources like Wikipedia, but in many cases the queries are not appropriate and will need the support from unstructured web text like news and blogs. The system can aggregate, for each query, tens of thousands of snippets into quantitative answers.

To be successful, a search engine needs a robust mechanism that indexes web pages, as there are millions of pages on the internet at a time. Google has over eight billion pages indexed and over 1.1 billion images.

Annotation is the backbone in the case of HP-IIT-B engine, indexing of annotations alongside ordinary text, and supporting a query language that can combine categories, annotations, quantities and regular text in creative ways, typically ending with evidence aggregation. "The key to moving up in the search value chain is to add semi-structured knowledge to the unstructured corpus, in the form of type, entity, category and relationship annotations, to index these annotations along with the text, and open up search application programming interfaces (APIs) and query languages to probe these indices and aggregate the resulting knowledge," says Prof. Chakrabarti.

0 comments: