Offre d'emploi (Non permanent)

Post-doctoral position in dissimilarity learning for interactive clustering

Présentation

The research performed in SCHISM will focus on interactive pattern-mining,

and interactive clustering in a chemoinformatics context. Interactive data-

mining is a recent research direction that breaks with the older paradigm of

specifying parameter settings for algorithms, letting the algorithm run, interpret

the results of the operation, and, based on this interpretation, adjust parame-

ter settings to restart the process. Interactive data-mining, on the other hand,

proposes partial or preliminary results to the user, collects their feedback, and

uses this feedback to bias the mining process going forward.

The overall goal of SCHISM is to develop a robust approach to interactive

data-mining that integrates both pattern-mining and clustering, and to deliver

a prototype that allows users to launch pattern mining or clustering algorithms,

visualize the results, give feedback, and rerun mining operations, which will take

the given feedback into account.

Mission

The general idea of the research work carried out at LITIS lab for the SCHISM

project is to rely on dissimilarity representations to both infer relevant cluster-

ings and to propose tools for visualization, analysis and expert feedback under

the form of constraints on the clusters. The goal is to de_ne a common repre-

sentation so that the data/constraints are projected into smaller spaces while

preserving the properties of the neighborhood. To do so, a preliminary study

1

has been carried out the last few months on using Random Forest (RF) mod-

els to measure dissimilarities and on using it afterwards for inferring relevant

clusterings.

The post-doctoral researcher will be in charge of (i) deepening the use of RF

dissimilarities for deriving a relevant clustering and (ii) proposing mechanisms of

analysis and interaction with the expert. More speci_cally, this latter task must

meet a two-fold objective, namely allowing the expert to select and analyze (sub-

)clusterings in order to propose a feedback, and translating this feedback into

the form of dissimilarity constraints in order to adjust the result. This research

work will therefore involve becoming familiar with Random Forest methods and

interactive clustering.

Profile du candidat

The successful applicant will:

1. possess or be on track to complete a PhD in computer science, or applied

mathematics with a focus on machine learning or data-mining, Fluency in

written and spoken English or French is essential;

2. have strong programming skills (Java, Python, etc.) and in-depth under-

standing of statistics and machine learning;

3. have a productive publication record;

4. have a strong work ethic and time management skills along with the ability

to work independently and within a multidisciplinary team as required.

Salary will be in line with European and French guidelines w.r.t. years of

research experience. Additional funding is available for travel.

Compétences requises

Organisation