Plenary Talks

Bernard de Baets photo

Monotone, but not boring:
Some encounters with non-monotonicity

Bernard De Baets, Ghent University (Belgium)

Slides

In many modelling problems, there exists a monotone relationship between one or more of the input variables and the output variable, although this may not always be fully the case in the observed input-output data due to data imperfections. Monotonicity is also a common property of evaluation and selection procedures. In contrast to a local property such as continuity, monotonicity is of a global nature and any violation of it is therefore simply unacceptable. We explore several problem settings where monotonicity matters, including fuzzy modelling, machine learning and decision making.


Bart photo

Cartification: from Similarities to Itemset Frequencies
Bart Goethals, University of Antwerp (Belgium)

Slides

Suppose we are given a multi-dimensional dataset. For every point in the dataset, we create a transaction, or cart, in which we store the k-nearest neighbors of that point for one of the given dimensions. The resulting collection of carts can then be used to mine frequent itemsets; that is, sets of points that are frequently seen together in some dimensions. Experimentation shows that finding clusters, outliers, cluster centers, or even subspace clustering becomes easy on the cartified dataset using state-of-the-art techniques in mining interesting itemsets.


Vojo photo

Mining Ultra-Large Datasets by Kernel Machines – GPUs Implementation and Novel Algorithms
Vojislav (Vojo) Kecman, Virginia Commonwealth University (USA)

Slides

There is no part of human activity untouched by the need to collect data today; examples are – www pages, e-commerce, biology, medicine, public health, images, …, as well as all the other fields of science and engineering. Ones stored datasets are valuable sources of knowledge. But, their shear size is increasing faster than a speed of CPUs and extracting the knowledge must be difficult if not impossible without rethinking how we explore and analyze such an abundance of data. Obviously, rethinking must include both the new hardware and the novel software (algorithmic) parts. The talk focuses on how support vector machines (known for their high accuracy and great performance in high-dimensional sparse spaces) can be efficiently used for solving classification (pattern recognition) problems faced with ultra-large data sets by implementing parallel SVMs algorithms on GPUs and by developing novel enclosing sphere based algorithms for SVMs. Both approaches are readily solving problems with millions of samples in a reasonable time. (If connections allow few examples will be run in a real time on our server in USA).


Osmar-Zaiane photo

Information Network Analysis: Applications and Challenges
Osmar R. Zaïane, University of Alberta (Canada)

Slides

Conventional data is typically considered as a collection of independent observations identically distributed in the space of possible attribute values.
In reality data are packed-full with all sorts of relationships themselves maintaining their own attribute values. For instance, in a database of books, these books are considered independent and when querying this database or recommending a book only the attribute values are taken into account. Yet, there are various relationships between books that could be considered such as co-authorship, friendship of readers, etc. These relationships have significant influence thus should be reflected upon in data analysis. Much data is nowadays in this form of networked entities such as in biology, criminology, sociology, marketing, finance, etc. Studying information networks is also known as social network analysis. Social network analysis is a field of study attempting to understand and measure relationships between entities in networked information.
We introduce social network analysis, illustrate some practical examples drawn from different application domains, examine some work done in this area, and present relevant challenges.


Zhi-Hua photo

A learning framework for data objects with complex semantics
Zhi-Hua Zhou, Nanjing University (China)

Slides

In traditional learning settings, a data object is usually represented by a single feature vector, or called an instance. Such a formulation has achieved great success; however, its utility is limited when handling data objects with complex semantics where one object belongs simultaneously to multiple semantic categories. For example, an image showing a lion besides an elephant can be recognized simultaneously as an image on lion, elephant, wild or even Africa; the text document “Around the World in Eighty Days’” can classified simultaneously into multiple categories such as scientific novel, Jules Verne’s writings or even books on traveling; a web page introducing the Bird’s Nest Stadium can be categorized as a web page on Olympics, sports or even Beijing city, etc. In many real tasks it is crucial to tackle such data objects. In this talk we will introduce the MIML (Multi-Instance Multi-Label learning) framework which has been shown promising for learning data objects with complex semantics. In addition to the introduction of some theoretical and algorithmic advances in this line of research, we will also introduce some real-world MIML applications.