Yiddish transliteration does not conform to YIVO transliteration

Thanks for a great library! While using it for Yiddish, I noticed that some of the transliterations do not conform to the [YIVO romanization standard](https://yivo.org/yiddish-alphabet). 

To pinpoint what kind of errors `uroman` is making, I conducted a [romanization experiment](https://github.com/j0ma/yiddish-transliteration-bakeoff) using the data from [Saleva (2020)](https://aclanthology.org/2020.lrec-1.119/) and another library for Yiddish romanization called [`yiddish`](https://www.github.com/ibleaman/yiddish).

Here are some benchmark numbers using accuracy and mean F1 score as defined in [Proceedings of the Seventh Named Entities Workshop](https://www.aclweb.org/anthology/W18-2408):

|library            |mean_f1           |accuracy          |
|-------------------|------------------|------------------|
|uroman             |0.937             |0.458             |
|yiddish            |0.990             |0.936             |

The diffs for what uroman gets wrong can be found [here](https://github.com/j0ma/yiddish-transliteration-bakeoff/blob/main/diffs/uroman). Many seem to be `i/y` mismatches as well as Hebrew expansion errors:

```
-aforizm
	+aforyzm
-aparatshik
-apteyk
-apteyker
-apikoyres
	+aparatshyk
	+aptyyk
	+aptyyker
	+apykurs
-apetit
	+apetyt
```

Would it be possible to implement the `-l yid` flag such that the output conforms to the YIVO romanization standard?
As far as I'm aware, it's by far the most used romanization format for Yiddish.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yiddish transliteration does not conform to YIVO transliteration #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

library	mean_f1	accuracy
uroman	0.937	0.458
yiddish	0.990	0.936

Yiddish transliteration does not conform to YIVO transliteration #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions