SIGIR 2015 Tutorials

Full day

Room 1: Building and Using Models for Information Seeking, Search and Retrieval.

Understanding how people interact with information systems when searching is central to the study of Interactive Information Retrieval (IIR).

This full day tutorial focuses on explaining and building formal models of Information Seeking and Retrieval. The tutorial is structured into four sessions. In the first two sessions we will discuss the rational of modelling and examine a number of early formal models of search (including early cost models and the Probability Ranking Principle). Then we will examine more contemporary formal models (including Information Foraging Theory, the Interactive Probability Ranking Principle, and Search Economic Theory). The focus will be on the insights and intuitions that we can glean from the math behind these models. The latter sessions will be dedicated to building models to optimise particular objectives that drive how users make decisions, in general, along with a how-to guide on model building, where we will describe different techniques (including analytical, graphical and computational) that can be used to generate hypotheses from such models. In the final session, participants will be challenged to develop a simple model of interaction applying the techniques learnt during the day, before concluding with an overview of challenges and future directions.

This tutorial is aimed at participants wanting to know more about the various formal models of information seeking, search and retrieval, that have been proposed. The tutorial will be presented at an introductory level, and is designed to support participants who want to be able to understand such models, as well as to build models.

http://www.dcs.gla.ac.uk/access/miss/

Leif Azzopardi is a Senior Lecturer within the School of Computing Science at the University of Glasgow. Leif works on modelling information access and retrieval problems to understand and why people search for information, how search systems can be improved and how such system shape and influence their users and the user population. To address such aims, his research focuses on using and building formal models for Information Retrieval - drawing upon a range of different disciplines for inspiration, such as Quantum Mechanics, Operations Research, Economics, Transportation Planning and Gamification.

Guido Zuccon is a lecturer at the Queensland University of Technology (QUT). His research interests include formal models of search, ranking principles for IR, and retrieval models for health search.

Guido has actively contributed to the area of document ranking and search result diversification. He has performed extensive analyses of document ranking principles, introduced the quantum probability ranking principle and was the first to empirically evaluate the interactive PRP. His work on result diversification based on facility location analysis received the best paper award at ECIR 2013; he also received a best reviewer award at ECIR 2014.

Guido obtained a Ph.D. in Computing Science from the University of Glasgow in 2012. Before joining QUT in 2014, he was a postdoctoral research fellow at the CSIRO, Australia.

Room 3: Information Retrieval with Verbose Queries.

Recently, the focus of many novel search applications shifted from short keyword queries to verbose natural language queries. Examples include question answering systems and dialogue systems, voice search on mobile devices and entity search engines like Facebook's Graph Search or Google's Knowledge Graph. However the performance of textbook information retrieval techniques for such verbose queries is not as good as that for their shorter counterparts. Thus, effective handling of verbose queries has become a critical factor for adoption of information retrieval techniques in this new breed of search applications.

Over the past decade, the information retrieval community has deeply explored the problem of transforming natural language verbose queries using operations like reduction, weighting, expansion, reformulation and segmentation into more effective structural representations. However, thus far, there was not a coherent and organized tutorial on this topic. In this tutorial, we aim to put together various research pieces of the puzzle, provide a comprehensive and structured overview of various proposed methods, and also list various application scenarios where effective verbose query processing can make a significant difference.

http://ciir.cs.umass.edu/~bemike/verbose-queries-SIGIR2015.html

Michael Bendersky is a Senior Software Engineer at Google, where he works on organizing the world's information and making it universally accessible and useful. He received his Ph.D. from the University of Massachusetts Amherst in 2012. Michael published more than 20 research papers on information retrieval with verbose natural language queries. His paper "Discovering Key Concepts in Verbose Queries", published at SIGIR 2008, has been widely cited as one of the seminal works in this research area. Since then, his papers on query segmentation, query expansion and query representations for information retrieval appeared at top-tier academic conferences, including SIGIR, CIKM, WSDM, WWW, ACL and SIGKDD. Michael co-organized a successful series of workshops on "Query Representation and Understanding" held at SIGIR 2010 and 2011. He also served as a publicity chair for the WSDM 2014 conference.

Manish Gupta is a Senior Applied Scientist at the Bing team in Microsoft India R&D; Private Limited at Hyderabad, India. He is also an Adjunct Faculty at International Institute of Information Technology, Hyderabad. He received his Masters in Computer Science from IIT Bombay in 2007 and his Ph.D. from the University of Illinois at Urbana-Champaign in 2013. Before this, he worked for Yahoo! Bangalore for two years. His research interests are in the areas of web mining, data mining and information retrieval. He has published more than 30 research papers in referred journals and conferences, including WWW, SIGIR, ICDE, KDD, PKDD, SDM conferences. He has also co-authored a book on Outlier Detection for Temporal Data.

Room 5: Music Retrieval and Recommendation.

In this tutorial, we give an introduction to the field of and state of the art in music information retrieval (MIR). The tutorial particularly spotlights the question of music similarity, which is an essential aspect in music retrieval and recommendation. Three factors play a central role in MIR research: (1) the music content, i.e., the audio signal itself, (2) the music context, i.e., metadata in the widest sense, and (3) the listeners and their contexts, manifested in user-music interaction traces. We review approaches that extract features from all three data sources and combinations thereof and show how these features can be used for (large-scale) music indexing, music description, music similarity measurement, and recommendation. These methods are further showcased in a number of popular music applications, such as automatic playlist generation and personalized radio stationing, location-aware music recommendation, music search engines, and intelligent browsing interfaces. Additionally, related topics such as music identification, automatic music accompaniment and score following, and search and retrieval in the music production domain are discussed.

Dr. Peter Knees is an assistant professor at the Department of Computational Perception at the Johannes Kepler University Linz. He holds a Master's degree in Computer Science from the Vienna University of Technology and a Ph.D. degree from the Johannes Kepler University Linz.

Since 2004, he co-authored over 60 peer-reviewed conference and journal publications, served as program committee member for several conferences relevant to the fields of music, multimedia, and text IR, including ISMIR, ACM Multimedia, ECIR Tutorials, and Adaptive Multimedia Retrieval and was an organizer of the International Workshop on Advances in Music Information Research series and the SIGIR 2014 Workshop on Social Media Retrieval and Analysis. He is teaching grad-level courses on Multimedia Search and Retrieval, Learning from User-generated Data, Multimedia Data Mining, and Intelligent Information Systems and has given tutorials and lectures on music IR at ECIR, SIGIR, and RuSSIR.

Markus Schedl is an associate professor at the Johannes Kepler University Linz / Department of Computational Perception.

He graduated in Computer Science from the Vienna University of Technology and earned his Ph.D. in Technical Sciences from the Johannes Kepler University Linz. Markus further studied International Business Administration at the Vienna University of Economics and Business Administration as well as at the Handelshögskolan of the University of Gothenburg, which led to a Master's degree.

Markus (co-)authored more than 100 refereed conference papers and journal articles (among others, published in ACM Multimedia, SIGIR, ECIR, IEEE Visualization; Journal of Machine Learning Research, ACM Transactions on Information Systems, Springer Information Retrieval, IEEE Multimedia). Furthermore, he is associate editor of the Springer International Journal of Multimedia Information Retrieval and serves on various program committees and reviewed submissions to several conferences and journals (among others, ACM Multimedia, ECIR, IJCAI, ICASSP, IEEE Visualization; IEEE Transactions of Multimedia, Elsevier Data & Knowledge Engineering, ACM Transactions on Intelligent Systems and Technology, Springer Multimedia Systems).

His main research interests include web and social media mining, information retrieval, multimedia, and music information research.

Since 2007, Markus has been giving several lectures, among others, "Music Information Retrieval", "Exploratory Data Analysis", "Multimedia Search and Retrieval", "Learning from User-generated Data", "Multimedia Data Mining", and "Intelligent Systems". He further spent several guest lecturing stays at the Universitat Pompeu Fabra, Barcelona, Spain, the Utrecht University, the Netherlands, the Queen Mary, University of London, UK, and the Kungliga Tekniska Högskolan, Stockholm, Sweden.

Morning

Room 2.1: IR Evaluation: Designing an End-to-End Offline Evaluation Pipeline.

This tutorial aims to provide attendees with a detailed understanding of end-to-end evaluation pipeline based on human judgments (offline measurement). The tutorial will give an overview of the state of the art methods, techniques, and metrics necessary for each stage of evaluation process. We will mostly focus on evaluating an information retrieval (search) system, but the other tasks such as recommendation and classification will also be discussed. Practical examples will be drawn both from the literature and from real world usage scenarios in industry.

Jin Young Kim graduated from UMass Amherst with Ph.D in Computer Science at 2012. He is currently an Applied Researcher at Relevance Science team in Microsoft Bing, where he spends most of his time in improving the measurement of search quality, consulting on challenging measurement issues, establishing best practice on measurement across the company. He published dozens of papers in the area of ranking model, user modelling, and evaluation for IR. He was a lecturer for offline measurement in recent Microsoft internal course for new employees, and gave numerous talks in conferences including SIGIR, ECIR, CIKM, WSDM and WWW.

Emine Yilmaz is an assistant professor in the Department of Computer Science University College London and a research consultant for Microsoft Research Cambridge. She is the recipient of the Google Faculty Award in 2014/15. Her main interests are evaluating quality of retrieval systems, modelling user behaviour, learning to rank, and inferring user needs while using search engines. She has published research papers extensively at major information retrieval venues such as SIGIR, CIKM and WSDM. She has previously given several tutorials on evaluation at the SIGIR 2012 and SIGIR 2010 Conferences and at the RuSSIR/EDBT Summer School in 2011. She has also organized several workshops on Crowdsourcing (WSDM2011, SIGIR 2011 and SIGIR 2010) and User Modelling for Retrieval Evaluation (SIGIR 2013). She has served as one of the organizers of the ICTIR Conference in 2009, as the demo chair for the ECIR Conference in 2013, and as the PC chair for the SPIRE 2015 conference. She is also a co-coordinator of the Tasks Track in TREC 2015.

Room 4.1: Revisiting the Foundations of IR.

As we face an explosion of potential new applications for the fundamental concepts and technologies of information retrieval, ranging from ad ranking to social media, from collaborative recommending to question answering systems, many researchers are spending unnecessary time reinventing ideas and relationships that are buried in the prehistory of information retrieval (which, for many researchers, means anything published before they entered graduate school).

Much of today's received wisdom may be nothing more than the fossilized residue of lively debates concerning such things as estimation of value and evaluation of systems. Returning to those discussions may open the door to genuinely new insights.

On the other hand, of the ideas that surface as "new" in today's super-heated research environment have very firm roots in earlier developments in fields as diverse as citation analysis, statistics, and pattern recognition. The purpose of this tutorial is to survey those roots, and their relation to the contemporary fruits on the tree of information retrieval, and to separate, as much as is possible in an era of increasing commercial secrecy about methods, the problems to be solved, the algorithms for solving them, and the heuristics that are the bread and butter of a working operation.

http://comminfo.rutgers.edu/~kantor/NEWLOOK/2015ShortVersion.pdf

Paul Kantor is Distinguished Professor of Information Science at Rutgers and a founding editor of the journal Information Retrieval. He serves as Research Director of the CCICADA Center for Advanced Data Analysis Paul has worked on information retrieval and evaluating information systems since 1972. He is a Fellow of the American Association for the Advancement of Science, a Senior Life Member of the IEEE and a member of the American Statistical Association, ASIST and the ACM. He is co-Editor of the Springer Recommender Systems Handbook first edition. His research has been supported by NSF, ARDA, DARPA, DHS, ONR, and other organizations. At Rutgers he is also a member of DIMACS Center for Discrete Mathematics and Computer Sciences; and on the graduate faculties of Computer Science and Operations Research. His hobbies include flying, and he promises tutorial participants a smooth take-off and a safe landing.

Room 6: An Introduction to Click Models for Web Search.

Click models, probabilistic models of the behavior of search engine users, have been studied extensively by the information retrieval community during the last five years. We now have more than a handful of basic click models, inference methods, evaluation principles and applications for click models, that form the building blocks of ongoing research efforts in the area. One of the key aims of the tutorial is to bring these together and offer a unified perspective. To achieve this, we describe the basic click models, inference methods and evaluation principles. We supplement this with an account of available datasets and packages plus a live demo based on these. We also present click model applications accompanied by examples.

We expect the tutorial to be useful for both researchers and practitioners that either want to develop new click models, use them in their own research in other areas or apply the models described here to improve actual search systems.

In this introductory tutorial we give an overview of click models for web search. We show how the framework of probabilistic graphical models help to explain user behavior, build new evaluation metrics and perform simulations. The tutorial is augmented with a live demo where participants have a chance to implement a click model and to test it on a publicly available dataset.

http://clickmodels.weebly.com/sigir-2015-tutorials.html

Aleksandr Chuklin is currently working on search problems at Google Switzerland. Apart from his projects at Google he is also working with Information and Language Processing Systems group at the University of Amsterdam on a number of research topics. His main research interests are modeling and understanding user behavior on a search engine result page. Aleksandr has a number of publications on click models and their applications at SIGIR, CIKM, ECIR. He is a PC member of WSDM and CIKM. Apart from that, he also has a solid background in industry, having worked with real-world IR problems at commercial companies, including Google and Yandex.

Ilya Markov is a postdoctoral researcher and SNF fellow at the University of Amsterdam. His research agenda builds around information retrieval methods for heterogeneous search environments. Ilya has experience in federated search, user behavior analysis, click models and effectiveness metrics. He is a PC member of leading IR conferences, such as SIGIR, WWW and ECIR, a PC chair of the RuSSIR 2015 summer school and a co-organizer of the HetUM workshop at CIKM 2015 and the IMine-2 task at NTCIR-12. Ilya is currently teaching MSc course on web search and has previously taught information retrieval courses at the BSc and MSc levels and given tutorials at conferences and summer schools in IR (ECIR, RuSSIR). Ilya received his PhD from the University of Lugano in 2014.

Maarten de Rijke is professor of Computer Science at the University of Amsterdam. He leads Amsterdam Data Science, a collaboration of around 300 data scientists in the Amsterdam area, and the Ad de Jonge Center for Intelligence and Security Studies. De Rijke's research focuses on intelligent information access, with projects on self-learning search engines, semantic search, and social media analytics. He has over 600 publications to his name, is the current editor-in-chief for ACM TOIS, and past PC co-chair for SIGIR and CIKM. De Rijke received numerous awards and academic, governmental and industrial grants, and helped launch a number of start-ups.

Afternoon

Room 2.2: IR Evaluation: Modeling User Behavior for Measuring Effectiveness.

This half-day tutorial on IR evaluation combines an introduction to classical IR evaluation methods with material on more recent user-oriented approaches. We primarily focus on off-line evaluation, but some material on on-line evaluation is also covered. The broad goal of the tutorial is to equip researchers with an understanding of modern approaches to IR evaluation, facilitating new research on this topic and improving evaluation methodology for emerging areas.

Charles Clarke is a Professor in the School of Computer Science at the University of Waterloo, Canada. His research interests include information retrieval, web search, and text data mining. He has published on a wide range of topics, including papers related to question answering, XML, filesystem search, HCI, and statistical NLP, as well as the evaluation of information retrieval systems. He was a Program Co-Chair for SIGIR 2007 and 2014. He is currently Chair of the SIGIR Executive Committee, and Co-Editor-in-Chief of the "Information Retrieval Journal". He is a co-author of the graduate textbook "Information Retrieval: Implementing and Evaluating Search Engines", MIT Press, 2010.

Mark Smucker is an associate professor in the Department of Management Sciences at the University of Waterloo. Mark's recent work has focused on making information retrieval evaluation more predictive of actual human search performance. Mark has been a co-organizer of two TREC tracks, a co-organizer of the SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation (MUBE) and the SIGIR 2010 workshop on the simulation of interaction. He is a recipient of the SIGIR best paper award (2012) for his work with Charles Clarke on the time-based calibration of effectiveness measures. He is also a recipient of the University of Waterloo, Faculty of Engineering's Teaching Excellence Award.

Room 4.2: Exploiting Wikipedia for Information Retrieval Tasks.

Wikipedia - the online encyclopedia - has long been used as a source of information for researchers, as well as being a subject of research itself. Wikipedia has been shown to be effective in recommender systems, sentiment analysis, validation and multiple domains in information retrieval. One of the reasons for Wikipedia's popularity among researchers and practitioners is the multiple types of information it contains, which enables practitioners to select the right "tool" for their respective tasks. In addition to its great potential, this multitude of information sources also poses a challenge: which sources of information are best suited for a specific problem and how can different types of data be combined?

This tutorial aims to provide a holistic view of Wikipedia's different features - text, links, categories, page views, editing history etc. - and explore the different ways they can be utilized in a machine learning framework. By presenting and contrasting the latest works that utilize Wikipedia in multiple domains, this tutorial aims to increase the awareness among researchers and practitioners in these fields to the benefits of utilizing Wikipedia in their respective domains, in particular to the use of multiple sources of information simultaneously.

http://vitiokm.wix.com/wikitutorial

Prof. Bracha Shapira is an Associate Professor of Information Systems and Software Engineering at Ben-Gurion University of the Negev and the chair of the department. Dr Shapira's main interests are in the field of recommender systems, personalization, and information retrieval and utilizing crowd wisdom from social media to enhance these services. She has led many research projects in these fields and has published nearly 100 reviewed papers in journals and conferences and has become a known expert in these areas.

Victor Makarenkov is an experienced practitioner and product manager, having a nearly 10 years experience in designing, implementing and deploying large-scale information retrieval systems. Currently he is pursuing his PhD degree in the department of Information Systems Engineering, Ben Gurion University of the Negev, Israel.

Nir Ofek is currently in the last period of his PhD studies (dissertation under review) at the department of Information Systems Engineering, Ben Gurion University of the Negev, Israel. His research interest lies in the intersection of text mining and machine learning, and aims to develop general and effective techniques in understanding latent opinions in human language.

Room 6: Advanced Click Models and their applications to IR.

This tutorial concerns with more advanced and more recent topics in the area of click models. Here, we discuss recent developments in the area with a particular focus on applications of click models. The tutorial features a guest talk and a live demo where participants have a chance to build their own advanced click model.

While this is the second part of the two half-day tutorials, it is not required for participants to attend the first one. In the beginning of this part, a short introduction to basic click models will be given so that all participants share a common vocabulary. Then, recent advances in click models will be discussed.

http://clickmodels.weebly.com/sigir-2015-tutorials.html

Santiago, Chile

August 9-13, 2015

The 38th Annual ACM SIGIR Conference

Program

Content

For Attendees

Sponsors

Organization

SIGIR 2015 Tutorials

Full day

Room 1: Building and Using Models for Information Seeking, Search and Retrieval.

Room 3: Information Retrieval with Verbose Queries.

Room 5: Music Retrieval and Recommendation.

Morning

Room 2.1: IR Evaluation: Designing an End-to-End Offline Evaluation Pipeline.

Room 4.1: Revisiting the Foundations of IR.

Room 6: An Introduction to Click Models for Web Search.

Afternoon

Room 2.2: IR Evaluation: Modeling User Behavior for Measuring Effectiveness.

Room 4.2: Exploiting Wikipedia for Information Retrieval Tasks.

Room 6: Advanced Click Models and their applications to IR.

Platinum Supporter

Gold
Supporters

Silver
Supporters

Bronze Supporters

Affiliated Publishers

Program

Content

For Attendees

Sponsors

Organization

Social

SIGIR 2015 Tutorials

Full day

Morning

Afternoon

Platinum Supporter

GoldSupporters

SilverSupporters

Bronze Supporters

Affiliated Publishers

Gold
Supporters

Silver
Supporters