Structure-informed Positional Encoding for Music Generation

Manvi Agarwal; Changhong Wang; Gaël Richard

Communication Dans Un Congrès Année : 2024

Structure-informed Positional Encoding for Music Generation

(1, 2) , (1, 2) , (1, 2)

1
2

Manvi Agarwal

Fonction : Auteur
PersonId : 1345834
ORCID : 0000-0002-8151-2176

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Changhong Wang

Fonction : Auteur
PersonId : 1237245
IdHAL : changhong-wang

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Gaël Richard

Fonction : Auteur
PersonId : 14146
IdHAL : gael-richard
IdRef : 094977208

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Résumé

Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary positional information. We comprehensively test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation. As a comparison, we choose multiple baselines from the literature and demonstrate the merits of our methods using several musically-motivated evaluation metrics. In particular, our methods improve the melodic and structural consistency of the generated pieces.

Mots clés

symbolic music generation Transformers music structure positional encoding symbolic music generation Transformers music structure positional encoding

Domaines

Son [cs.SD] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

svbwdvrdnrztpzxgdsckkhqxkjbjpfzx.pdf (1.16 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Manvi Agarwal : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04432659

Soumis le : mercredi 28 février 2024-13:22:50

Dernière modification le : vendredi 1 mars 2024-03:09:57

Dates et versions

hal-04432659 , version 1 (15-02-2024)

hal-04432659 , version 2 (20-02-2024)

hal-04432659 , version 3 (28-02-2024)

Identifiants

HAL Id : hal-04432659 , version 3
ARXIV : 2402.13301

Citer

Manvi Agarwal, Changhong Wang, Gaël Richard. Structure-informed Positional Encoding for Music Generation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024, Seoul, South Korea. ⟨hal-04432659v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM PARISTECH GENCI LTCI IDS S2A IP_PARIS

209 Consultations

71 Téléchargements

Structure-informed Positional Encoding for Music Generation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager