Research Interests and Expertise
Dr. Carman is an expert in
Data Science,
Data Mining and the analysis of
Big Data. His primary research interests are
Information Retrieval and
Machine Learning. His specific areas of expertise include:
His research and interests span theoretical studies (e.g. investigating statistical properties of information retrieval measures), through to practical applications (e.g. technology for assisting police during
digital forensic investigations).
Dr. Carman has authored a large number of publications in prestigious venues, including full papers at SIGIR, KDD, IJCAI, CIKM, ECIR, WSDM, HT, CoNLL, EACL, HCOMP and ICDAR, and articles in TOIS, IR,
JMLR, ML, PR, JAIR, CS&L, JASIST, DI and CSUR.
Major contributions of his research career have included developing
state-of-the-art techniques for:
-
learning Web search ranking functions and transferring knowledge across collections (see for example),
-
fast Machine Learning algorithms allowing systems to scale up to very large datasets (see example),
-
clustering high-dimensional data using density-based and subspace-clustering techniques (see example),
-
improving quality-control for crowd-sourcing applications (see example),
-
accelerating digital forensic investigations and analysis of the Dark Web (see example),
-
personalising Web search results to the interests of individual searchers (see example),
-
characterising, detecting and generating sarcasm in text (see example),
-
optical-character recognition error correction for Indic languages (see example),
-
efficiently and accurately evaluating Named Entity extraction systems (see example),
-
product recommendation based on user ratings and click data (see example),
-
modelling the category structure in Wikipedia for document classification and clustering (see example),
-
ranking of weblogs based on content and expressed sentiment (see example),
-
analysing and leveraging tag data in information retrieval (see example),
-
routing queries to appropriate collections within search engines (see example),
-
learning semantic descriptions of online services (see example),
-
planning for the automated composition of web services (see example), and
-
managing data repositories on a data grid (see example).
Research Projects
Dr. Carman is involved in a variety of research projects, some of his recent ones are summarised below.
-
Learning Web Search ranking functions:
-
Transferring rankers across collections:
-
Scaling classifiers to massive datasets:
-
Identifying arbitrarily-shaped clusters in high dimens.:
-
Quality control for crowd-sourcing applictions:
-
Accelerating digital forensic investigations:
-
Detecting and generating sarcasm in text:
-
Error correction in OCR for Indic languages:
-
Estimating user expertise in social media:
-
Efficient evaluation of NLP APIs:
-
Modeling category hierarchies in Wikipedia:
-
Personlising search and product recommendation:
A Brief Bio (see also CV)
Mark Carman is a senior lecturer at Monash University in Melbourne, Australia. He joined Monash in 2010 after a postdoc at the University of Lugano. He received his PhD from the University of Trento in 2006 having worked at both the Fondazione Bruno Kessler (FBK-IRST) and the Information Sciences Institute (USC-ISI). Mark works primarily in information retrieval, applying and extending statistical machine learning techniques to the modelling of users and user-generated content. He has served on the program committees of many IR/DM conferences (SIGIR, ECIR, KDD, CIKM, EMNLP, AAAI, ACML, etc.) and is an Associate Editor for TOIS.
PhD Students
Graduated:
Current:
-
Janis Dalins, Accelerating digital forensic investigations
-
Yuan Jin, Improving the reliability of crowd-sourced data
-
Li Pengfei, Unsupervised transfer learning techniques for learning to rank
-
Rohit Saluja, Improving reliability of optical character technology for Indic scripts
The following is a list of selected publications. Please send an if you can't get access to a particular paper.
Software
EIDOS (Efficiently Inducing Definitions for Online Sources) is a system for learning semantic descriptions of online information sources (such as RSS feeds). The descriptions are used to automatically integrate the sources into (mediator based) information integration systems. A complete description of the purpose and functionality of the system can be found in my thesis. You can also have a look at the slides I presented at my defense. The software can be downloaded from the ISI website. It is royalty-free for research purposes and comes with all the source code. Here is the latest documentation. Feel free to contact me with installation questions.
Monash Personal Page Disclaimer
This web page is not authorized by Monash University and any opinions expressed on the page are those of the author and not those of the University.