Fraunhofer-Gesellschaft

Master thesis »Real-Time Piano Multipitch Estimation using Convolutional Neural Networks«


This job is now closed

PayCompetitive
LocationIlmenau/Thuringia
Employment typeFull-Time
  • Job Description

      Req#: 65914

      The Fraunhofer Institute for Digital Media Technology IDMT is part of the Fraunhofer-Gesellschaft. Headquartered in Ilmenau, Germany, the institute is internationally recognized for its expertise in applied electroacoustics and audio engineering, AI-based signal analysis and machine learning, and data privacy and security. At the headquarters, on the campus of “Technische Universität Ilmenau” researchers work on technologies for robust, trustworthy AI-based analysis and classification of audio and video data. These are used, among other things, to monitor industrial production processes, but also in traffic monitoring or in the media context, for example when it comes to automatic metadata extraction and audio manipulation detection. Another focus is the development of algorithms for the areas of virtual product development, intelligent actuator-sensor systems and audio for the automotive sector. There are currently around 70 employees working at Fraunhofer IDMT in Ilmenau.

      What you will do

      Automatic music transcription (AMT) aims to extract score-like representations from audio recordings [1, 2]. It remains one of the most challenging tasks in Music Information Retrieval (MIR). AMT algorithms can be categorized according to the instrument being transcribed and its role within a musical ensemble. A distinction can be made between the transcription of percussive instruments such as drums, and harmonic instruments that play either the melody part, the bass, or the polyphonic accompaniment. In the latter case, multipitch estimation (MPE) aims to detect all active pitches played by an instrument such as a piano or guitar at the frame-level.

      Music learning applications are an exciting use case for AMT algorithms, where an instrumental or vocal performance is recorded and transcribed in real-time in order to provide immediate feedback about the performance quality (and possible performance errors). Consequently, this scenario requires the AMT algorithm to run with low latency.

      In this thesis, a recently proposed algorithm [3] for real-time MPE of piano recordings will be studied which uses a convolutional neural network (CNN) architecture to predict both the active pitches played on a piano and their velocity (loudness) values. The author demonstrates that it outperforms other recently proposed MPE models based on the transformer architecture such as [7].

      In particular, in this Master's Thesis, the following objectives should be accomplished:

      (1) Conduct a state-of-the-art research on MPE algorithms with a special focus on deep learning based methods. The ability to be run in real-time should be of particular importance. Furthermore, the student should become familiar with the Maestro v3 [4], MAPS [5], and SMD-synth [6] datasets and their annotation schemes.

      (2) Re-implement the MPE model proposed in [3] and re-evaluate the results.

      (3) Optimize the model architecture for real-time use by further reducing its memory footprint and temporal latency.

      (4) In a more extensive evaluation investigate the robustness of the MPE model towards other pianos using cross-dataset evaluation and using the MAPS and SMD-synth datasets. Optionally, the influence of reverberation and background noises in the test set recordings shall be simulated to study their effect on MPE performance.


      The student should document their work in a written thesis.

      References:

      [1] G. E. Poliner, D. P. W. Ellis, A. F. Ehmann, E. Gomez, S. Streich and B. Ong, "Melody Transcription From Music Audio: Approaches and Evaluation," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1247-1256, May 2007, doi: 10.1109/TASL.2006.889797.

      [2] E. Benetos, S. Dixon, Z. Duan and S. Ewert, "Automatic Music Transcription: An Overview," in IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 20-30, Jan. 2019, doi: 10.1109/MSP.2018.2869928.

      [3] Andres Fernandez (2023). Onsets and Velocities: Affordable Real-Time Piano Transcription Using Convolutional Neural Networks. arXiv preprint arXiv:2303.04485v1 (https://arxiv.org/abs/2303.04485)

      [4] The MAESTRO Dataset (https://magenta.tensorflow.org/datasets/maestro)

      [5] MAPS - A piano database for multipitch estimation and automatic transcription of music (https://inria.hal.science/inria-00544155/en)

      [6] SMD-synth: A synthesized variant of the SMD MIDI-Audio Piano Music subset (https://zenodo.org/record/4637908)

      [7] C. Hawthorne, I. Simon, R. Swavely, E. Manilow, and J. H. Engel, Sequence-to-sequence piano transcription with transformers, in ISMIR Proceedings, 2021, pp. 246–253

      What you bring to the table

      Very good skills in music signal processing, machine learning, and deep learning are required, as well as a passion for music.

      What you can expect

      • exciting market-related topics with complex issues to be solved – you can be actively involved in shaping the future
      • challenges at a high level – on top we offer you excellent opportunities for professional and technical trainings
      • space to also implement your own ideas, such as in our quarterly open-topic idea contest
      • an excellent technical infrastructure
      • renowned partners and customers who work closely with you to develop the technologies of tomorrow
      • a very good work-life balance thanks to flexible working hours, a co-child office, the option of digital childcare in case of daycare shortages, and the possibility of mobile working, because family comes first – we know that
      • an open-minded and interested team, a tolerant and familiar atmosphere as well as regular team events
      • good transport connections and proximity to the state capital Erfurt
      • attractive special offers as part of Fraunhofer corporate benefits with numerous enterprise partners
      • new work and diversity are not just empty buzzwords, but an integral part of our corporate culture

      The weekly working time is 39 hours. This position is also available on a part-time basis. We value and promote the diversity of our employees' skills and therefore welcome all applications - regardless of age, gender, nationality, ethnic and social origin, religion, ideology, disability, sexual orientation and identity. Severely disabled persons are given preference in the event of equal suitability.

      With its focus on developing key technologies that are vital for the future and enabling the commercial utilization of this work by business and industry, Fraunhofer plays a central role in the innovation process. As a pioneer and catalyst for groundbreaking developments and scientific excellence, Fraunhofer helps shape society now and in the future.

      Interested? Apply online now. We look forward to getting to know you!

      Professional queries:

      Jakob Abeßer
      jakob.abesser@idmt.fraunhofer.de

      Andrew McLeod
      andrew.mcleod@idmt.fraunhofer.de

      Questions about the application process:

      Katrin Pursche
      katrin.pursche@idmt.fraunhofer.de

      Fraunhofer Institute for Digital Media Technology IDMT

      www.idmt.fraunhofer.de

      Requisition Number: 65914 Application Deadline:

  • About the company

      The Fraunhofer Society is a German research organization with 76 institutes spread throughout Germany, each focusing on different fields of applied science.

Notice

Talentify is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.

Talentify provides reasonable accommodations to qualified applicants with disabilities, including disabled veterans. Request assistance at accessibility@talentify.io or 407-000-0000.

Federal law requires every new hire to complete Form I-9 and present proof of identity and U.S. work eligibility.

An Automated Employment Decision Tool (AEDT) will score your job-related skills and responses. Bias-audit & data-use details: www.talentify.io/bias-audit-report. NYC applicants may request an alternative process or accommodation at aedt@talentify.io or 407-000-0000.