Dynamic Composition
of Information Retrieval Techniques
Shlomo Zilberstein, Principal Investigator
Department of Computer Science
140 Governor's Drive
University of Massachusetts
Amherst, MA 01003-4610
Phone: (413) 545-4189
Fax: (413) 545-1249
Email: shlomo@cs.umass.edu
URL: http://www.cs.umass.edu/~shlomo/ |
James Allan, Co-Principal Investigator
Department of Computer Science
140 Governor's Drive
University of Massachusetts
Amherst, MA 01003-4610
Phone: (413) 545-3240
Fax: (413) 545-1789
Email: allan@cs.umass.edu
URL: http://www.cs.umass.edu/~allan/ |
WWW Page
URL: http://anytime.cs.umass.edu/shlomo/research/DCIR.html
Project Award Information
- Award Number: IIS-9907331
- Duration: 01/01/2000 - 12/31/2002
- Title: Dynamic Composition of Information
Retrieval Techniques
Keywords
Progressive processing, information retrieval,
search engines, monitoring and control of computation, resource-bounded
reasoning, Markov decision process
Project Summary
This project is a collaboration between
the Information Retrieval Center and the Resource-Bounded Reasoning
Lab at the University of Massachusetts. It is aimed at developing
a dynamic approach for the construction of information retrieval
search engines that will improve their flexibility, quality of
service and robustness. Currently, these systems are built by
integrating a fixed set of modules and techniques that perform
tasks such as query formation, query optimization, query evaluation,
precision improvement, and recall improvement. The new approach
consists of context-dependent mechanisms for optimal selection
of information retrieval techniques based on a probabilistic description
of their performance. The approach addresses the high level of
uncertainty regarding the run-time of complex retrieval techniques
and their impact on the quality of the results. The current static
approach to integration of information retrieval modules continues
to produce small performance gains, but the systems are extremely
specialized for each task, and it is not clear how well results
generalize to new types of retrieval. This project offers an alternative
approach with significant advantages because it allows a system
to configure itself dynamically to the specific task at hand,
to the person using the system, and to limited computational resources.
Project References
- S. Zilberstein, A.I. Mouaddib, and A.
Arnt. Managing Computational Resources in Dynamic Environments
Using Opportunity Cost. Submitted to Journal of Artificial
Intelligence Research, 2000.
- S. Zilberstein and A.I. Mouaddib. Optimizing
Resource Utilization in Planetary Rovers. Proceedings of the
2nd NASA International Workshop on Planning and Scheduling for
Space, 163-168, 2000.
- S. Zilberstein and A.I. Mouaddib. Reactive
Control of Dynamic Progressive Processing. Proceedings of
the 16th International Joint Conference on Artificial Intelligence,
1268-1273, Stockholm, Sweden, 1999.
- A.I. Mouaddib and S. Zilberstein. Optimal
Scheduling of Dynamic Progressive Processing. Proceedings
of the 13th Biennial European Conference on Artificial Intelligence,
449-503, Brighton, UK, 1998. (ECAI-98 Best Paper Award.)
Collaborators
Dr. Abdel-Illah Mouaddib from centre
de recherches en informatique de Lens (CRIL), France, is an
international collaborator participating in this project.
Project Impact
We have completed the first year of the
project and can report the following impact:
- Technology: using machine learning
to predict the performance of IR techniques. We have analyzed a statistical query expansion
technique (LCA) and found that we can predict more than half
of the situations in which it will be useful. This result is
critical to the success of the project, and is the first time
this effect has been seen in IR.
- Technology: reusable meta-level control
policies for progressive processing tasks.
We have developed a technique to select the best subset of IR
techniques for a given query by solving a corresponding Markov
decision problem. In order to factor the effect of the load on
the system, the reward for processing a query includes an opportunity
cost -- the cost of taking computational resources away from
the remaining queries. Several fast approximation schemes for
the opportunity cost have been developed, providing an effective
framework for managing computational resources in highly dynamic
environments.
- Technology: IR system design. Partly funded by this
project, the CIIR is
re-architecting its core information retrieval system into a
thread-based system of cooperating components. The new architecture,
called Metamorph, is being designed to allow configuration of an IR
system for a wide range of tasks. The architecture is influenced by
the goals of this project--i.e., rapid and lightweight reconfiguration
of a system based on the meta-level control policies.
- Graduate education: The project supports two doctoral students, one
in the Resource-Bounded Reasoning Lab (Andrew Arnt) and one in
the Information Retrieval Center (we are currently recruiting
a student to fill this position). The students working on the
project receive training in both AI and IR. Andrew Arnt is successfully
completing the "portfolio" requirements of the Ph.D.
program.
We anticipate that the completion of the
project will have the following additional impact:
- Performance gains: We anticipate substantial gains in the quality
of service of information retrieval search engines both under
normal operation and particularly under increased workload.
- Robustness:
The resulting system will be able to respond to a high workload
by reducing the amount of computation needed to process each
query and dynamically adjusting the quality of service.
- Open system design: The resulting system will facilitate sharing and
integration of information retrieval techniques developed by
others by establishing a standard for exchanging intermediate
results within an open system architecture.
Goals, Objectives, and Targeted Activities
The objective of this project is to construct
a new approach to building information retrieval search engines
that addresses the following key challenges:
- Composability of information retrieval
techniques: How to unify the specifications
of a variety of existing information retrieval techniques so
that they can be used as plug-in components of a complete system?
- Identifying computational tradeoffs
in information retrieval: How to
characterize the computational cost and the benefits of a particular
IR technique? How to measure the interdependency between different
techniques and the characteristics of the input query?
- Monitoring and control: How to measure and monitor the progress that
the system makes with a specific task? How to handle the uncertainty
associated with both quality and duration of each module? How
to dynamically select the best IR techniques in a particular
situation with minimal computational overhead?
- Scalability of resource-bounded reasoning
techniques: By definition, information
retrieval tasks involve large collections and a substantial amount
of test data. One of the challenges of this project is to address
the scalability of resource-bounded reasoning techniques to handle
very large problem domains.
- Dynamic configuration of search engines: From an information retrieval perspective, this
project will test the ability to move from a careful, design-time
system configuration to a more flexible approach based on run-time
analysis of the specific characteristics of the task.
To achieve these objectives, we are planning
the following activities for the second year of the project: improve
the machine learning techniques we have used to predict performance;
use larger data sets for evaluation; modify
the task model to reflect the lessons learned from the first year
(for example, partial execution of a technique may be needed to
better predict its performance); test the efficacy of the Metamorph
architecture for lightweight reconfiguration of a system;
develop a prototype of a search
engine composed of several techniques that offer a time/quality
tradeoff; and evaluate the search engine and the approach we
developed to control it.
Area Background
The ability to dynamically adjust computational
effort based on the availability of computational resources has
been studied extensively by the AI community since the mid 1980's.
These efforts have led to the development of a variety of techniques
such as anytime algorithms [Dean & Boddy, 1988; Zilberstein
& Russell, 1996], design-to-time [Garvey & Lesser,
1993], flexible computation [Horvitz, 1987], and progressive
reasoning [Mouaddib & Zilberstein, 1998]. Similar models
of computation have been studied within the systems community
in the area of imprecise computation [Liu et al., 1991].
In this project, we adapt the progressive processing framework
to build a resource-bounded reasoning system that is particularly
suited for dynamic composition of information retrieval techniques
[Jones & Willett]. The information retrieval techniques we
currently use are based on the extensive infrastructure available
at the Center for Information retrieval at UMass, and in particular
on components of the INQUERY system [Callan et al., 1992]. The
dynamic selection of modules is based on their marginal value,
which is determined by several factors: the characteristics of
the specific request, the quality of the intermediate result produced
by the system, the execution time of the module under consideration,
and its anticipated contribution to the overall quality of service.
Additional factors that can be taken into account are the consumer
of the results and the expected reward for the service.
The current static approach to integration
of information retrieval modules continues to produce performance
gains of about 10% each year [Allan et al., 1997], but the systems
are extremely specialized for each task, and it is not clear how
well results will generalize to new types of retrieval. We believe
that our new approach will provide significant advantages because
it allows a system to configure itself dynamically to the specific
task at hand. Our approach will allow systems to take advantage
of query processing techniques when it makes sense for them to
do so, and to include or exclude key components of the system
based on whether they are expected to work, and whether there
is time for them to be applied.
Area References
- T.L. Dean and M. Boddy. An analysis of
time-dependent planning. Proceedings of the 7th National Conference
on Artificial Intelligence, 49-54, Minneapolis, Minnesota,
1988.
- J.P. Callan, W.B. Croft, and S.M. Harding.
The INQUERY retrieval system. Proceedings of the 3rd International
Conference on Database and Expert Systems Applications, 78-83,
Valencia, Spain, 1992.
- J. Allan, J. Callan, W.B. Croft, L. Ballesteros,
D. Byrd, R. Swan, and J. Xu. INQUERY Does Battle with TREC-6.
The 6th Text Retrieval Conference (TREC-6), NIST Special
Publication, 1997.
- Croft, W.B. Combining Approaches to Information
Retrieval. To appear in Advances in Information Retrieval: Recent
Research from the CIIR, W. Bruce Croft (ed.) Kluwer Academic
Publishers, 2000.
- A. Garvey and V. Lesser. Design-to-time
real-time scheduling. IEEE Transactions on Systems, Man and
Cybernetics, 23(6):1491-1502, 1993.
- E.J. Horvitz. Reasoning about beliefs
and actions under computational resource constraints. Proceedings
of the 1987 Workshop on Uncertainty in Artificial Intelligence,
Seattle, Washington, 1987.
- K.S. Jones and P. Willett. (eds.) Readings
in Information Retrieval. Morgan Kaufmann Publishers, 1997.
- J. Liu, K. Lin, W. Shih, A. Yu, J. Chung,
and W. Zao. Algorithms for scheduling imprecise computations.
IEEE Transactions on Computers, 24(5):58-68, 1991.
- A.I. Mouaddib and S. Zilberstein. Optimal
scheduling of dynamic progressive processing. Proceedings
of the 13th Biennial European Conference on Artificial Intelligence,
449-503, Brighton, UK, 1998.
- S. Zilberstein and S. J. Russell. Optimal
composition of real-time systems. Artificial Intelligence,
82(1-2):181-213, 1996.
- S. Zilberstein and A.I. Mouaddib. Reactive
control of dynamic progressive processing. Proceedings of
the 16th International Joint Conference on Artificial Intelligence,
1268-1273, Stockholm, Sweden, 1999.
Potential Related Projects
The following web sites provide additional
information and links to related projects.