Information Retrieval Facility
From Wikipedia, the free encyclopedia
This article does not cite any references or sources. (April 2008) Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. |
The Information Retrieval Facility (IRF), founded 2006 and located in Vienna, Austria, is a research platform for networking and collaboration for professionals in the field of information retrieval.
The IRF has members in the following categories:
• Researchers in information retrieval (IR) or related scientific areas
• Industrial/corporate information management professionals
• Patent authorities and governmental institutions
• Students of one of the above
Contents |
[edit] The Scientific Board
C.J. van Rijsbergen, Chairman of the Scientific Board, Dept. Computer Science at the University of Glasgow
Jamie Callan, Professor, Language Technologies Institute, CMU, Carnegie Mellon University
Yves Chiaramella, Professor Emeritus, Department of Computer Science and Applied Mathematics, Joseph Fourier University
Kilnam Chon, Professor, Computer Science Department, Korea Advanced Institute of Science and Technology (KAIST)
W. Bruce Croft, Distinguished Professor, Department of Computer Science and Director Center for Intelligent IR University of Massachusetts Amherst
Hamish Cunningham, Research Professor, Computer Science Department University Sheffield
Norbert Fuhr, Professor, Institute of Informatics and Interactive Systems University Duisburg-Essen
David Hawking, Science Leader, Project Leader, CSIRO ICT Centre
Arcot Desai Narasimhalu, Associate Dean, School of Information Systems Singapore Management University
John Tait, Chief Scientific Officer of the IRF, Until July 2007 Professor of Intelligent Information Systems and Associate Dean of the School of Computing and Technology
[edit] Scientific Goals
• Modelling innovative and specialised information retrieval systems for global patent document collections.
• Investigating and developing an adequate technical infrastructure that allows interactive experimentation with formal, mathematical retrieval concepts for very large-scale document collections.
• Studying the usability of multi modal user-interfaces to very large-scale information retrieval systems.
• Integrating real users with actual information needs into the research process of modelling information retrieval systems to allow accurate performance evaluation.
• Ability to create different views of patent data depending on the focus of the information need.
• Defining standardised methods for benchmarking the information retrieval process in patent document collections.
• Ability to handle text and non-text parts of a patent in a coherent manner.
• Designing, experimenting and evaluating search engines able to retrieve structured and semi-structured documents in very large-scale patent collections.
• Integrating the temporal dimension of patent documents in retrieval strategies
• Improving effectiveness and precision of patent retrieval, based on ontologies and natural-language understanding techniques.
• Refining IR methods that allow unstructured querying by exploiting available structure within the patent documents.
• Formal (mathematical) identification and specification of relevant business information needs in the field of intellectual property information.
• Investigating efficient scaling mechanisms for information retrieval taking into account the characteristics of patent data.
• Investigating and experimenting with computing architectures for very high-capacity information management.
• Establishing an open eScience platform that enables a standardised and easy way of creating and performing IR experiments on a common research infrastructure.
• Discovering and investigating novel use cases and business applications deriving from intellectual property information.
• Enabling the formal information retrieval, natural language and semantic processing research to grow into the field of applied sciences in the global, industrial context.
• Development and integration of different information access methods.
• Research on effective methods for interactive information retrieval.
[edit] Semantic Supercomputing
Current technologies to extract concepts from unstructured documents are extremely computational intensive. To allow interactive experimentation with rich and huge text corpora, the IRF has built a high performance computing environment, into which the latest technological advances have been implemented:
• multi-node clusters (currently 80 cores, up to 1024)
• highest speed interconnect technology
• single system image with large compound memory (currently 320 GB, up to 4 TB)
• fully integrated configurable computing (currently 4 FPGA cores, up to 256)
The combination of these HPC features to accelerate text mining represents the IRF implementation of semantic supercomputing.
[edit] The World Patent Corpus
The IRF aims to bring state-of-the-art information retrieval technology to the community of patent information professionals. We expect information retrieval (IR) technology to become the focus of information technology very soon. All industry sectors can profit from applying modern and future text mining processes to the special requirements of patent research. Although all ideas and concepts are universally applicable to all sorts of intellectual property information, patents require the most sophistication, and confront us with challenging technical and organisational problems. The entire body of patent-related documents possibly constitutes the largest corpus of compound documents, making it a rewarding target for text mining scientists and end-users alike. What’s more, patents have become a crucial issue, in particular for large global corporations and universities. The industrial users of patent data are among the most demanding and important information professionals. As a consequence, they could benefit the most from technology that relieves the burden of researching the large body of patent information.
[edit] The eScience Environment
The IRF implements a modern eScience infrastructure for experimentation in IR. The discovery of new methods is leveraged by the use of innovative information technology. The IRF eScience platform delivers significant improvements to scientists by enabling them to transparently use semantic supercomputing resources for algorithm development, using their familiar lab tools.