Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanish language support #1

Closed
MiClobiMC opened this issue Mar 1, 2021 · 19 comments
Closed

Spanish language support #1

MiClobiMC opened this issue Mar 1, 2021 · 19 comments
Labels
help wanted Extra attention is needed

Comments

@MiClobiMC
Copy link

Is it possible to do a tutorial how to install? Does it work in Spanish?

@nbusser
Copy link
Collaborator

nbusser commented Mar 1, 2021

Hi, this project is only a library.
You can find applications here:

Soon, we will write documentation for these applications and make GUI stable.

Unfortunately, only french language is supported right now. When the GUI application will be stable, we will think about extend the library to other languages. It seems not so complicated to do regarding our architecture.

@pop123123123
Copy link
Owner

If you want to try in your language though, when you install Montreal Forced Aligner just follow the instructions and take Spanish acoustic and pronunciation models, instead of french ones as indicated on this repo, this should work out of the box.

@nbusser
Copy link
Collaborator

nbusser commented Apr 7, 2021

Hi @MiClobiMC !

Yesterday, we added language support to the library.

Sadly, the only pre-made dictionaries we could find were in English and in German.

However, it exists a Spanish pretrained model for MFA. The only thing lacking is the pronunciation dictionary, decomposing each word of the language to phonemes.

We wrote a guide to add an unsupported language. We have no real clue of how long, how tedious and how difficult it is to create such dictionary. Hopefully, the task is pretty easy.

You should definitely ask Michael McAuliffe about this. I'm sure that he will guide you in such task. Contact him at michael.e.mcauliffe@gmail.com.

If you want to add support for Spanish language, please keep us in touch.

@MiClobiMC
Copy link
Author

Hello, I just saw the tutorial, the installation went very well but it only works in English since I tested in Spanish and since there is no dictionary in Spanish it gives an error

@nbusser
Copy link
Collaborator

nbusser commented Apr 8, 2021

You should then contact Michael McAuliffe and ask him for Spanish dictionary creation guideline.

@nbusser nbusser changed the title Tutorial install pls Spanish language support Apr 9, 2021
@MiClobiMC
Copy link
Author

@MiClobiMC
Copy link
Author

I have spoken with Michael but he tells me that I have to generate the dictionary but the problem is that I have windows and I cannot do it

@nbusser
Copy link
Collaborator

nbusser commented Apr 12, 2021

Can you tell us more about what Michael actually said about generating the dictionary ?
We guess that Michael told you to use g2p to generate your dictionary.

Workaround

Here is the error you likely get when trying to give the IPA spanish dict to the GlobalPhone MFA spanish model:

dictionary phones: {'k', 'd', 'a', 's', 'f', 'm', 'i', 't', 'n', 'b', 'ɲ', 'p', 'j', 'ɾ', 'l', 'ɡ', 't͡ʃ', 'r', 'u', 'o', 'ɟ͡ʝ', 'e', 'x', 'w'}
model phones: {'i+', 'd', 'k', 'a', 'a+', 's', 'aI', 'f', 'u+', 'm', 'i', 'L', 't', 'z', 'tS', 'n~', 'e+', 'n', 'b', 'p', 'G', 'j', 'g', 'l', 'ng', 'o+', 'V', 'r', 'rf', 'eU', 'u', 'o', 'aU', 'oI', 'e', 'D', 'T', 'x', 'eI', 'w'}

You see that the phonemes symbols of the dictionary (dictionary_phones) are different from the phonemes symbols whose the MFA model have been trained with (model_phones).

Four phonemes ɲ, ɾ, t͡ʃ and ɟ͡ʝ appear in the IPA dictionary but are not known by the MFA model.
You then only need to replace these four phonemes with their MFA equivalent everytime they appear on the dictionary.

Warning

The IPA dictionary you find looks great.
However, you can see that it uses only 24 phonemes while MFA is trained with 40 phonemes.
It means that if you change the phonemes of the IPA dictionary to match the phonemes of the MFA model, you could loose a lot of precision.

Please give us the process that Michael told you, so we can think about this (we will probably change our architecture if there is a way to turn it more dynamic).

@MiClobiMC
Copy link
Author

estoes

@nbusser
Copy link
Collaborator

nbusser commented Apr 13, 2021

We will likely change the library and use dynamic g2p generated dictionaries instead of static dictionaries.
Regarding our current investment on the project, it will probably take more than a month.

Waiting for that modification, you should use the workaround I specified in my previous message. It should be simple and fast to do. Tell us if you have any question or if you encounter difficulties.

@MiClobiMC
Copy link
Author

I did the above to replace phonemes but I don't have the consonants and vowels in Spanish

@nbusser
Copy link
Collaborator

nbusser commented Apr 13, 2021

You have to create this small dict.
It's very simple:

  • put the vowel phonemes in the line VOWEL
  • put the consonant phonemes in the line CONSONANT
  • add the line SPACE sp

Here is an example

@MiClobiMC
Copy link
Author

MiClobiMC commented Apr 13, 2021

I did the phonemes replacement, but I still have this problem:
fatalerror

@nbusser
Copy link
Collaborator

nbusser commented Apr 13, 2021

Did you use the same dictionary ?
It says that the phonemes N, Z for example are in your dictionary but I cannot find them when browsing it.
Can you explain me what you did when replacing the phonemes ?

@MiClobiMC
Copy link
Author

MiClobiMC commented Apr 13, 2021

Well, the phonemes that you told me to replace, I replaced them with the notepad example ɲ = N ɾ = D t͡ʃ = J ɟ͡ʝ = Y
consonantes

asi

@nbusser
Copy link
Collaborator

nbusser commented Apr 13, 2021

Ok you drove the correct process.

However, you did some mistakes: you can see in the list of the model phonemes in my previous post that, for example, phoneme N does not exist (while phoneme n does).

It looks like you inserted as well other undefined phonemes like Z or β.

Make sure that dictionary phones is a subset of model phones in the error message.

@MiClobiMC
Copy link
Author

MiClobiMC commented Apr 13, 2021

It says the word was not recognized, while the word is in the video and in the dictionary
fixbut

@nbusser
Copy link
Collaborator

nbusser commented Apr 14, 2021

The program doesn't find HOLA in the dictionary. Are you sure you correctly changed the config.json ?
A good thing you can try is to delete .downloads and .subs folders and save.pckl file.

Since your initial problem is solved, we will close this issue.

If you got further issues related to the usage of the CLI application, I suggest you to look for its reference in the restrictions/workaround section of the CLI dedicated github page.
If you cannot find your answer there, please open an issue on the CLI application github issues form.
We will be happy to help you on it.

Don't hesitate to give us a feedback.

@nbusser nbusser closed this as completed Apr 14, 2021
Repository owner deleted a comment from MiClobiMC Apr 14, 2021
Repository owner deleted a comment from MiClobiMC Apr 14, 2021
Repository owner deleted a comment from MiClobiMC Apr 14, 2021
@nbusser nbusser added the help wanted Extra attention is needed label Apr 14, 2021
@nbusser
Copy link
Collaborator

nbusser commented Apr 14, 2021

Conclusion

The problem of @MiClobiMC was that the words in his dictionary were not in upper case, which is mandatory.
In addition, MFA did not detect non-ASCII unrecognized phonemes symbols. Thus, if MFA produces no error when analyzing the videos, it does not necessary means that the dictionary is valid.

Spanish language have been added to the SM-Dictionaries repository on this commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants