Postdoctoral Researcher in Artificial Intelligence and Natural Language Processing (SCAI/BnF research program)

Bibliotheques Paris

Bibliotheques / Bibliotheques Paris 48 Views comments

The Sorbonne Middle for Artificial Intelligence (SCAI) of Sorbonne College and the BnF supply a 12-month (renewable) postdoctoral contract in artificial intelligence and pure language processing.

Who're we?

Sorbonne College is a multidisciplinary analysis college created on January 1, 2018 by merging the schools Paris-Sorbonne and UPMC. Deploying its coaching to 54,000 students including four,700 doctoral students and 10,200 overseas college students, It employs 6,300 academics, teacher-researchers and researchers and 4,900 library, administrative, technical, social and well being employees. Its price range is 670 M€. Sorbonne University has a first-rate potential, primarily situated within the coronary heart of Paris, and extends its presence in additional than twenty sites in Île-de-France and in the regions. Sorbonne College is organized into three schools: Humanities, Science & Engineering and Drugs. These schools have vital autonomy to implement the college’s strategy within their own boundaries, based mostly on a contract of aims and assets. University governance is primarily dedicated to selling the university’s strategy, steering, creating partnerships and diversifying assets.

Presentation of the undertaking

In a nationwide and worldwide context marked by competition around artificial intelligence, Sorbonne College has created the “Sorbonne Middle for Synthetic Intelligence” (SCAI), which brings together in a single location, situated within the coronary heart of the Latin Quarter, a strategic range of disciplines in trendy synthetic intelligence. The ambition of SCAI is to contribute significantly to the excellence of interdisciplinary analysis in synthetic intelligence by selling exchanges between professors, researchers, academics, college students and industrialists.

The analysis venture described under is a part of the strategic partnership between Sorbonne College and the BnF, which brings collectively the expertise of the MLIA group of ISIR at the BnF to be able to develop a joint analysis work with reference to recommender methods.

The Bibliothèque nationale de France (BnF) is among the largest heritage libraries on the planet. Its mission is to gather, catalog, preserve, enrich and talk the national documentary heritage. For many years now, BnF has been concerned in formidable digitization packages for its collections, to which we will now add the huge entry of natively digital collections. BnF is consistently enriching its digital heritage, the mass, variety and fee of progress of which require new processing and session tools. To enable as many individuals as potential to discover and applicable this heritage, BnF has been concerned in artificial intelligence (AI) applied sciences for several years.

Major activities

Gallica, the digital library of the BnF, incorporates almost 10 million digitized paperwork which are freely accessible online (18.5 million visits per yr). Nevertheless, most users do not know that Gallica incorporates not only printed documents, but in addition pictures, sound recordings, movies, and 3D objects. In satisfaction surveys, only a minority of customers contemplate the search engine’s answers to be relevant and a majority want to be better guided of their searches. A suggestion system should have the ability to help users find their method by way of the mass of collections and enhance the visibility of the least recognized. In this venture, BnF is committed to adopting a resolutely ethical strategy. The exploitation of consumer logs should respect their privacy and guarantee each the relevance and transparency of the algorithms, avoiding the danger of filter bubbles. The interface design can also be at the heart of the strategy: a reliable system depends on a very good consumer experience and on the range and relevance of the proposed recommendations. Three strains of thought emerge:

  1. based mostly on the out there knowledge, including each consumer logs and collection descriptions, easy methods to develop predictive algorithms?
  2. methods to combine variety in the suggestion algorithm whereas leaving the choice to the consumer to average his serendipity threshold?
  3. how one can construct consumer trust in algorithm design and audit?

Important missions

This undertaking consists in working on info entry in the Gallica library, from the perspective of machine and deep studying methods. The research axes concern (1) the evaluation and indexing of textual paperwork as well as (2) the evaluation of consumer traces and (three) suggestion methods. We're notably keen on multimodal methods that permit contextualizing a document or a query based mostly on consumer interactions.

The successful candidate might be chargeable for:

  • Implementing fashions to study the semantics of textual knowledge for the aim of indexing them.
  • Creating algorithms based mostly on illustration studying methodologies to effectively blend text and consumer traces.
  • Reporting and presenting improvement work in a transparent and efficient manner, each for dialogue with BnF specialists and writing machine studying publications.

The printed e-book assortment would be the main focus of the program described above, but an extension to different collections with textual descriptors (particularly iconographic collections) may be thought-about.


A PhD diploma in Pc Science or equal is required, in addition to a robust scientific report, notably in NLP and/or Recommender Techniques and/or Info Retrieval. Experience with worldwide research tasks and purposes in SHS can be an asset.

Basic info

  • Location: Pierre and Marie Curie campus of Sorbonne College and Datalab of the BnF
  • Contract: 12-month fixed-term contract with the potential for an extension
  • Anticipated hiring date: as quickly as potential
  • Workload: full time
  • Desired expertise: 1 to three years
  • Salary in line with experience

Most important contacts

  • Laure Soulier, MCF in pc science at Sorbonne College, MLIA workforce, ISIR.
  • Emmanuelle Bermès, Scientific and Technical Assistant to the Director of Providers and Networks at BnF.
  • Jean-Philippe Moreux, Scientific professional of Gallica at the BnF.

Supervision: NO
Venture administration: YES

Information and expertise

A robust background in pure language processing or textual content analysis is important, and good programming expertise are required. Experience with recommender techniques is assumed. An understanding of the moral issues of such methods can also be anticipated. Language: information of French is just not required however is strongly most popular.


Purposes (CV + motivation + references) must be sent by e-mail to with a replica to

  • Physique: 12-month postdoctoral contract, renewable)
  • Attachment: UMR 7222 ISIR
  • Keywords: machine studying, explainability, databases, pc science, applied mathematics, statistics, pure language processing, suggestion
Emblem Sorbonne Université