<< back to Mark Carman's Web page
Discovering, Annotating and Modeling Information Sources
Integrating new services into existing integration systems (or "mash-ups") is an important problem. The diagram below shows the three phases of automated service discovery and integration. We first search for new services (using a search engine), we then annotate the services discovered (with the types of the parameters being passed) and finally, we build a model of the new sources (in terms of the relationships between those parameters).
An Example
In the first phase we perform keyword-based search over a service registry or Web index (e.g., UDDI or Google) to find relevant services. In the example, the keyword "hotels" is given to a search engine, which returns a service called
HotelLookup.
In the second phase, we determine what type of data the service takes as input and returns as output. This is done by assigning semantic types such as
HotelName and
PhoneNumber (as opposed to syntactic types like
string and
integer) to each attribute (
Name,
Num, etc.) of the service as seen in the diagram.
In the third phase, we model the service by discovering how the input and output parameters relate to one another. This relationship is described by a database view definition (conjunctive query in Datalog) as shown in the diagram. In our example, the source definition (at the bottom right of the figure) states that the service returns the addresses and phone numbers of all hotels which lie within a certain distance of a given zipcode, (where the distance and zipcode are given as input).
Note that in general, the services will have many more attributes and far more complicated definitions than is the case for our simple example.
Research
Research in this first phase of service discovery has concentrated on improving search performance by first classifying services into semantic domains (e.g.,
travel) [
1] or clustering similar services together [
2].
For the second phase of service discovery, researchers have demonstrated that classifiers can be used to assign semantic types to input/output parameters using metadata labels ("Zip", "Name", "Num", etc.) as features [
1]. More recently [
3] extended the feature set to include also the data ("Ritz-Carlton", "(310) 823-1700", etc.) generated by the service. The approach requires active invocation of the service using example input tuples (e.g., <"90292","5">), but the resulting classification outperforms that based on metadata alone.
Research in the third phase of service discovery is more recent with the goal being to learn view definitions automatically. In [
4,
5] we describe a system capable of
inducing declarative source definitions automatically from examples of the data produced by a service.
Our system actively invokes the new service to generate the example data and then searches the space of plausible source definitions (conjunctive queries) until it finds one that produces data similar to that observed.
References
[1]
Learning to Attach Semantic Metadata to Web Services.
Andreas Heß and Nicholas Kushmerick.
In 2nd International Semantic Web Conference (ISWC), 2003.
[2]
Simlarity Search for Web Services.
Xin Dong, Alon Y. Halevy, Jayant Madhavan, Ema Nemes, and Jun Zhang.
In Proceedings of VLDB, 2004.
[3]
Automatically Labeling the Inputs and Outputs of Web Services,
Kristina Lerman, Anon Plangrasopchok and Craig Knoblock (2006).
In Proceedings of AAAI-2006, Boston, MA, USA.
[4]
[5]