Skip to content

Add logic for parsing references from last page of PDF#11156

Merged
koppor merged 19 commits intomainfrom
parse-from-pdf
Apr 8, 2024
Merged

Add logic for parsing references from last page of PDF#11156
koppor merged 19 commits intomainfrom
parse-from-pdf

Conversation

@koppor
Copy link
Copy Markdown
Member

@koppor koppor commented Apr 7, 2024

A scientific paper has a "References" section. Especially when reviewing papers, it would be nice if all references from there would appear parsed within JabRef. This PR implements that. Thus, this PR implements #10200 via offline parsing (no online services used!), follow-up to #10437.

The parser is rule-based and uses Regular Expressions (RegEx).

How to use:

Pre Condition

Steps

  1. Create an entry in JabRef
  2. Attach the PDF to JabRef
  3. Open the context menu
  4. Select "Extract references"
    image
  5. A dialog for importing is shown.
  6. Select "Select all entries" and then "Import entries"
    image

Status

  • Functionality implemented. The UI should show "online" and "offline" more transparently. This is the current work I am implementing.
  • Works for IEEE papers. This functionality will be used for 1.000+ papers in this field, thus, it is "OK for now". If other reviewers (e.g., for Springer papers) will raise their voice, we can refine the parser.

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

koppor added 3 commits April 7, 2024 09:40
…FromPdfImporter)

- Support more date formats
- Increase log level for issues for date parsing
Copy link
Copy Markdown
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

for (BibEntry importedEntry : result.getDatabase().getEntries()) {
count++;
Optional<String> citationKey = importedEntry.getCitationKey();
if (citationKey.isPresent()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

citationKey.map(cites:add).orElseGet( () ->

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if new code is more readable --> "orElseGet" result needs to be added to the list, too. Uses outer variable "count", which is non final. I needed to wrap in anonymous object.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then better use the original code

// Y. Shimosaki et al., “Lattice design for 5 MeV – 125 mA CW RFQ operation in LIPAc”, in Proc. IPAC’19, Mel- bourne, Australia, May 2019, pp. 977-979. doi:10.18429/ JACoW-IPAC2019-MOPTS051
int pos = reference.indexOf("doi:");
if (pos >= 0) {
String doi = reference.substring(pos + 4).trim();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that this are always 4 characters?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure that the constant string "doi:" alwas has 4 characters. But in a parallel universe this might change. Thus, I will change to "doi:'.length() later

@Siedlerchr
Copy link
Copy Markdown
Member

You should resolve the conflicts in changelog so that the tests are running

@koppor koppor changed the title [WIP] Add logic for parsing references from last page of PDF Add logic for parsing references from last page of PDF Apr 7, 2024
@koppor koppor marked this pull request as ready for review April 7, 2024 11:31
@koppor koppor enabled auto-merge April 7, 2024 11:31
@koppor koppor mentioned this pull request Apr 7, 2024
6 tasks
This reverts commit 7adb334.
@koppor koppor added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Apr 7, 2024
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2024

The build for this PR is no longer available. Please visit https://builds.jabref.org/main/ for the latest build.

@koppor koppor added this pull request to the merge queue Apr 8, 2024
Merged via the queue into main with commit a0080ba Apr 8, 2024
@koppor koppor deleted the parse-from-pdf branch April 8, 2024 09:05
@koppor koppor mentioned this pull request Jul 21, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants