In a library of dictionaries, any specific term should show up in only one dictionary in that library set. (eg. Some terms and abbreviations may be climate-specific, but not chapter-specific. So in the case of the ICPP reports, we want to ensure extracted abbreviated terms are included only in the chapter in which the term is most relevant.
Therefore, when dictionaries are being created from individual ICPP chapters, it would be useful to include two new attributes:
source="" (which requires a step for the user to input the name of the source, or for it to be detected from the source document as a first step of term extraction)
count=""
This will then allow another script (to be created) to compare the frequency of terms appearing in each chapter, and move (or suggest to a human for moving) the term in the most relevant dictionary, and removing it from all others.
This process must be the same in (or occur only in one or the other of) docanalysis and py4ami