Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
658ae5e
add JSON output
lfoppiano Sep 13, 2025
0af3fbd
finalize output - remove duplicated paragraphs
lfoppiano Sep 13, 2025
f54feb1
add markdown output
lfoppiano Sep 13, 2025
21c6cb5
Update grobid_client/format/TEI2LossyJSON.py
lfoppiano Oct 12, 2025
db7d457
Avoid leaking opened files
lfoppiano Oct 13, 2025
af228ca
Avoid leaking opened files
lfoppiano Oct 13, 2025
29cad08
Merge branch 'master' into feature/json_output
lfoppiano Oct 13, 2025
8e7d9da
move import
lfoppiano Oct 13, 2025
9c1b2ee
avoid problems with duplicated references in the same sentence
lfoppiano Oct 13, 2025
93d7367
if JSON output does not exist, create it even if the TEI was not prod…
lfoppiano Oct 29, 2025
19c6ff0
typos
lfoppiano Oct 29, 2025
3fc43db
consistently using pathlib
lfoppiano Oct 29, 2025
51f733f
Merge branch 'feature/json_output' into feature/markdown-output
lfoppiano Oct 29, 2025
190b264
Update markdown, fix author/affiliation extraction
lfoppiano Oct 29, 2025
71497bf
add references in the output
lfoppiano Oct 29, 2025
6f54c61
improve references, fix fulltext
lfoppiano Oct 29, 2025
0ed682e
Various fixes
lfoppiano Oct 29, 2025
4404d0d
fix paths and logs
lfoppiano Oct 30, 2025
8abe2fc
fix --verbose and document it
lfoppiano Oct 30, 2025
bd4e96a
Merge branch 'feature/json_output' into feature/markdown-output
lfoppiano Oct 30, 2025
60ac656
fix paths
lfoppiano Oct 30, 2025
7deac6d
update tests
lfoppiano Oct 31, 2025
2f98e40
fix references offsets, fix missing starting/end offsets, show files …
lfoppiano Oct 31, 2025
3d1d931
fix tests
lfoppiano Oct 31, 2025
d5c9607
Merge branch 'master' into feature/markdown-output
lfoppiano Oct 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update grobid_client/format/TEI2LossyJSON.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
  • Loading branch information
lfoppiano and Copilot authored Oct 12, 2025
commit 21c6cb5327f2721971ba6f15fcf70bf5a9f1c05b
2 changes: 1 addition & 1 deletion grobid_client/format/TEI2LossyJSON.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ def _convert_file_worker(path: str):
from bs4 import BeautifulSoup
import dateparser
# Reuse existing top-level helpers from this module by importing here
from grobid_client.format.TEI2LossyJSON import box_to_dict, get_random_id, get_formatted_passage, get_refs_with_offsets, xml_table_to_markdown, xml_table_to_json
from grobid_client.format.TEI2LossyJSON import box_to_dict, get_random_id, get_formatted_passage, get_refs_with_offsets, xml_table_to_json
content = open(path, 'r').read()
soup = BeautifulSoup(content, 'xml')
converter = TEI2LossyJSONConverter()
Expand Down