Skip to content

Yiddish transliteration does not conform to YIVO transliteration #5

@j0ma

Description

@j0ma

Thanks for a great library! While using it for Yiddish, I noticed that some of the transliterations do not conform to the YIVO romanization standard.

To pinpoint what kind of errors uroman is making, I conducted a romanization experiment using the data from Saleva (2020) and another library for Yiddish romanization called yiddish.

Here are some benchmark numbers using accuracy and mean F1 score as defined in Proceedings of the Seventh Named Entities Workshop:

library mean_f1 accuracy
uroman 0.937 0.458
yiddish 0.990 0.936

The diffs for what uroman gets wrong can be found here. Many seem to be i/y mismatches as well as Hebrew expansion errors:

-aforizm
	+aforyzm
-aparatshik
-apteyk
-apteyker
-apikoyres
	+aparatshyk
	+aptyyk
	+aptyyker
	+apykurs
-apetit
	+apetyt

Would it be possible to implement the -l yid flag such that the output conforms to the YIVO romanization standard?
As far as I'm aware, it's by far the most used romanization format for Yiddish.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions