DOI to Bib(La)TeX – a misery
My PhD-thesis–to–be combines 8 papers from the last 5 years. Their Bib(La)TeX bibliography entries come in a wide range of quality and style. I would like some consistency but it’s quite an effort to achieve across 234 entries. So I was wondering if there’s any good quality and consistent source from where I could (hopefully automatically) update their data via their DOI.
Services
Let’s look at a bunch of services for getting BibTeX entries by DOI. I’ll use the DOI 10.1007/978-3-031-50524-9_4 (one of my papers) as the example.
Click on each tab to see the BibTeX entry from each service and my comments about it.
-
@inbook{Saan_2023, title={Correctness Witness Validation by Abstract Interpretation}, ISBN={9783031505249}, ISSN={1611-3349}, url={http://dx.doi.org/10.1007/978-3-031-50524-9_4}, DOI={10.1007/978-3-031-50524-9_4}, booktitle={Verification, Model Checking, and Abstract Interpretation}, publisher={Springer Nature Switzerland}, author={Saan, Simmo and Schwarz, Michael and Erhard, Julian and Seidl, Helmut and Tilscher, Sarah and Vojdani, Vesal}, year={2023}, month=dec, pages={74–97} }
This is returned by DOI Content Negotiation which simply means making an HTTP(S) request to the usual DOI URL https://doi.org/10.1007/978-3-031-50524-9_4 but with the
Accept: application/x-bibtex
HTTP header, i.e.curl -LH "Accept: application/x-bibtex" https://doi.org/10.1007/978-3-031-50524-9_4
For this particular DOI, this actually delegates to the Crossref API at
curl -L https://api.crossref.org/works/10.1007/978-3-031-50524-9_4/transform/application/x-bibtex
Comments
- The entry type is
@inbook
, although@inproceedings
would be more precise for this work. - The
url
field has value http://dx.doi.org/10.1007/978-3-031-50524-9_4. There are two things wrong with that:- It’s HTTP, not HTTPS.
- It uses dx.doi.org, not just doi.org.
The former options in both points are no longer preferred, yet the official DOI metadata service doesn’t follow its own recommendations.
- The
booktitle
field is actually not specified for@inbook
in BibTeX. It is specified for@inproceedings
, so it really should be that. In BibLaTeX,booktitle
is also specified for@inbook
but only because BibLaTeX gives@inbook
a slightly different meaning than BibTeX. - The whole result is on one line (fine) and has a spurious single space in the beginning (which is odd).
- The entry type is
-
@misc{Saan_Schwarz_Erhard_Seidl_Tilscher_Vojdani_2023, title={Correctness Witness Validation by Abstract Interpretation}, url={http://dx.doi.org/10.1007/978-3-031-50524-9_4}, DOI={10.1007/978-3-031-50524-9_4}, journal={Lecture Notes in Computer Science}, publisher={Springer Nature Switzerland}, author={Saan, Simmo and Schwarz, Michael and Erhard, Julian and Seidl, Helmut and Tilscher, Sarah and Vojdani, Vesal}, year={2023}, month=dec, pages={74–97}, language={en} }
This is returned by the DOI Citation Formatter for the style
bibtex
, which can also be accessed through an API:curl 'https://citation.doi.org/format?doi=10.1007%2F978-3-031-50524-9_4&style=bibtex&lang=en-US'
Comments
It’s quite similar to the previous one from DOI Content Negotiation, but objectively worse:
- The entry type is now just
@misc
. - The
booktitle
field is missing (it’s not specified for@misc
anyway), and the title “Verification, Model Checking, and Abstract Interpretation” isn’t in any other field either. - The
journal
field is now present (it’s not specified for@misc
either!) and has value “Lecture Notes in Computer Science”, which isn’t a journal but a book series (which belongs to theseries
field, if it wasn’t for@misc
).
- The entry type is now just
-
@inbook{Saan2023, title = {Correctness Witness Validation by Abstract Interpretation}, ISBN = {9783031505249}, ISSN = {1611-3349}, url = {http://dx.doi.org/10.1007/978-3-031-50524-9_4}, DOI = {10.1007/978-3-031-50524-9_4}, booktitle = {Verification, Model Checking, and Abstract Interpretation}, publisher = {Springer Nature Switzerland}, author = {Saan, Simmo and Schwarz, Michael and Erhard, Julian and Seidl, Helmut and Tilscher, Sarah and Vojdani, Vesal}, year = {2023}, month = dec, pages = {74–97} }
This is returned by doi2bib at https://www.doi2bib.org/bib/10.1007/978-3-031-50524-9_4. doi2bib is just a browser frontend for DOI Content Negotiation and performs client-side reformatting. As far as I have seen, many other tools actually do this under the hood.
Comments
It has all the issues of DOI Content Negotiation and only the following differences:
- The formatting is generally more human-friendly.
- The formatting adds double spaces after commas in field values. This shouldn’t affect Bib(La)TeX, but is odd nevertheless.
-
@InProceedings{10.1007/978-3-031-50524-9_4, author="Saan, Simmo and Schwarz, Michael and Erhard, Julian and Seidl, Helmut and Tilscher, Sarah and Vojdani, Vesal", editor="Dimitrova, Rayna and Lahav, Ori and Wolff, Sebastian", title="Correctness Witness Validation by Abstract Interpretation", booktitle="Verification, Model Checking, and Abstract Interpretation", year="2024", publisher="Springer Nature Switzerland", address="Cham", pages="74--97", abstract="Witnesses record automated program analysis results and make them exchangeable. To validate correctness witnesses through abstract interpretation, we introduce a novel abstract operation unassume. This operator incorporates witness invariants into the abstract program state. Given suitable invariants, the unassume operation can accelerate fixpoint convergence and yield more precise results. We demonstrate the feasibility of this approach by augmenting an abstract interpreter with unassume operators and evaluating the impact of incorporating witnesses on performance and precision. Using manually crafted witnesses, we can confirm verification results for multi-threaded programs with a reduction in effort ranging from 7{\%} to 47{\%} in CPU time. More intriguingly, we discover that using witnesses from model checkers can guide our analyzer to verify program properties that it could not verify on its own.", isbn="978-3-031-50524-9" }
This is returned by the “Download citation (.BIB)” feature of Springer Link which the particular DOI points to:
curl 'https://citation-needed.springer.com/v2/references/10.1007/978-3-031-50524-9_4?format=bibtex&flavour=citation'
Comments
- This is completely different from the previous ones based on DOI Content Negotiation. I guess because those actually come from Crossref’s database, while this one comes from Springer’s own database, but as a user I shouldn’t have to know or care. It’s still Springer submitting data to Crossref and the DOI URL itself redirects to Springer under normal conditions (standard HTTP request).
- The entry type is
@InProceedings
, which is more accurate than all the previous ones. - The
doi
field is missing. The DOI is in the entry key, although that doesn’t help to have the DOI show up in a Bib(La)TeX bibliography. - The
url
field is also missing. Thus, there would be no digital reference in a rendered bibliography. - The formatting is multiline, but not indented.
-
@inproceedings{10.1007/978-3-031-50524-9_4, author = {Saan, Simmo and Schwarz, Michael and Erhard, Julian and Seidl, Helmut and Tilscher, Sarah and Vojdani, Vesal}, title = {Correctness Witness Validation by Abstract Interpretation}, year = {2024}, isbn = {978-3-031-50523-2}, publisher = {Springer-Verlag}, address = {Berlin, Heidelberg}, url = {https://doi.org/10.1007/978-3-031-50524-9_4}, doi = {10.1007/978-3-031-50524-9_4}, abstract = {Witnesses record automated program analysis results and make them exchangeable. To validate correctness witnesses through abstract interpretation, we introduce a novel abstract operation unassume. This operator incorporates witness invariants into the abstract program state. Given suitable invariants, the unassume operation can accelerate fixpoint convergence and yield more precise results. We demonstrate the feasibility of this approach by augmenting an abstract interpreter with unassume operators and evaluating the impact of incorporating witnesses on performance and precision. Using manually crafted witnesses, we can confirm verification results for multi-threaded programs with a reduction in effort ranging from 7\% to 47\% in CPU time. More intriguingly, we discover that using witnesses from model checkers can guide our analyzer to verify program properties that it could not verify on its own.}, booktitle = {Verification, Model Checking, and Abstract Interpretation: 25th International Conference, VMCAI 2024, London, United Kingdom, January 15–16, 2024, Proceedings, Part I}, pages = {74–97}, numpages = {24}, keywords = {Correctness Witness, Witness Validation, Software Verification, Program Analysis, Abstract Interpretation}, location = {London, United Kingdom} }
This is returned by the “Export Citation” feature of ACM Digital Library at https://dl.acm.org/doi/10.1007/978-3-031-50524-9_4. Although the particular work is published by Springer, ACM seems to index it.
Comments
- The
title
field value includes
, which is inappropriate for Bib(La)TeX. - The
publisher
andaddress
field values “Springer-Verlag” and “Berlin, Heidelberg” seem wrong because Springer itself returned “Springer Nature Switzerland” and “Cham”. (Although personally I don’t care: I would drop theaddress
and simplifypublisher
to “Springer”.) - The
booktitle
field value includes the book’s subtitle “25th International Conference, VMCAI 2024, London, United Kingdom, January 15–16, 2024, Proceedings, Part I”. In BibTeX, there’s no other way (except omitting it like in all previous services). BibLaTeX specifies thebooksubtitle
field, and even more appropriate ones likeeventtitle
,venue
andeventdate
(as also pointed out in this TeX StackExchange answer). - The formatting is multiline, but not indented.
- The
-
@inproceedings{DBLP:conf/vmcai/SaanSESTV24, author = {Simmo Saan and Michael Schwarz and Julian Erhard and Helmut Seidl and Sarah Tilscher and Vesal Vojdani}, title = {Correctness Witness Validation by Abstract Interpretation}, booktitle = {{VMCAI} {(1)}}, series = {Lecture Notes in Computer Science}, volume = {14499}, pages = {74--97}, publisher = {Springer}, year = {2024} }
This is returned by the “export record (BibTeX)” feature of DBLP at https://dblp.org/rec/conf/vmcai/SaanSESTV24.html?view=bibtex¶m=0. DBLP offers multiple BibTeX formats, this being the condensed one.
Comments
- The
doi
field is missing and, unlike Springer, it’s not in the entry key either. - The
url
field is also missing. -
The
volume
field value is “14499” which actually corresponds to theseries
“Lecture Notes in Computer Science”. This is wrong in both BibTeX and BibLaTeX: it should instead be thenumber
field with the value “14499”.BibTeX specifies:
- number
- The number of […] a work in a series. […] sometimes books are given numbers in a named series.
- volume
- The volume of a journal or multivolume book.
BibLaTeX specifies:
- number
- […] the volume/number of a book in a series.
- volume
- The volume of a multi-volume book or a periodical.
This has also been pointed out in this TeX StackExchange answer.
-
The
booktitle
field value is essentially “VMCAI (1)”, where the 1 refers to the part. The latter is what actually should go into thevolume
field according to the specifications above.Alternatively, BibLaTeX also specifies:
- part
- The number of a partial volume. This field applies to books only, not to journals. It may be used when a logical volume consists of two or more physical ones. In this case the number of the logical volume goes in the
volume
field and the number of the part of that volume in thepart
field.
The distinction between logical and physical is a bit hazy in this case. Even Springer cannot make up their mind about the terminology:
- The subtitle of the book ends with “Part I”.
- The Springer Link page for the book has the section “Other volumes”.
- The “About this book” section on the same page mentions both, while starting with “The two-volume set LNCS 14499 and 14500 […]”.
- The formatting is the nicest of them all. Although, when copying the BibTeX code from the DBLP website, the copied text includes two empty leading and trailing lines for some reason. The empty lines are not present in the downloadable .bib file.
- The
-
@inproceedings{DBLP:conf/vmcai/SaanSESTV24, author = {Simmo Saan and Michael Schwarz and Julian Erhard and Helmut Seidl and Sarah Tilscher and Vesal Vojdani}, editor = {Rayna Dimitrova and Ori Lahav and Sebastian Wolff}, title = {Correctness Witness Validation by Abstract Interpretation}, booktitle = {Verification, Model Checking, and Abstract Interpretation - 25th International Conference, {VMCAI} 2024, London, United Kingdom, January 15-16, 2024, Proceedings, Part {I}}, series = {Lecture Notes in Computer Science}, volume = {14499}, pages = {74--97}, publisher = {Springer}, year = {2024}, url = {https://doi.org/10.1007/978-3-031-50524-9\_4}, doi = {10.1007/978-3-031-50524-9\_4}, timestamp = {Sat, 10 Feb 2024 18:04:44 +0100}, biburl = {https://dblp.org/rec/conf/vmcai/SaanSESTV24.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
This is returned by the “export record (BibTeX)” feature of DBLP at https://dblp.org/rec/conf/vmcai/SaanSESTV24.html?view=bibtex¶m=1. DBLP offers multiple BibTeX formats, this being the standard one.
Comments
It is mostly an extension of the previous one from DBLP, but with additional fields which can be (and are) treated incorrectly:
- The
doi
field value has the underscore escaped. This is unnecessary and even wrong: DOI lookup returns “DOI Not Found”. - The
url
field value also has the underscore escaped. This is again unnecessary and even wrong: theurl
is broken. - The
booktitle
field value is uncondensed, but has the same issues as with ACM.
- The
The tab content ends here.1
Comparison
Here’s a table to summarize some aspects of the entries returned by the services. The values I consider acceptable are in bold2 and the values I prefer are in italic.
Feature | DOI | DOI formatter | doi2bib | Springer | ACM | DBLP condensed | DBLP standard |
---|---|---|---|---|---|---|---|
Entry type | @inbook | @misc | @inbook | @InProceedings | @inproceedings | @inproceedings | @inproceedings |
doi | Yes | Yes | Yes | No | Yes | No | Yes3 |
url | dx.doi.org | dx.doi.org | dx.doi.org | No | doi.org | No | doi.org3 |
year | 2023 | 2023 | 2023 | 2024 | 2024 | 2024 | 2024 |
Event info | No | No | No | No | In booktitle | No | In booktitle |
LNCS № | No | No | No | No | No | In volume | In volume |
Book part | No | No | No | No | In booktitle | In booktitle | In booktitle |
The table also compares the year
field values which weren’t discussed above. Surprisingly, there even isn’t consensus about such a basic fact. It probably has to do with “First Online: 30 December 2023”. The Crossref data for the DOI seems to correspond to that, while Springer itself considers the publication to be in 2024, which is also when the conference took place. This just goes to show that the DOI Content Negotiation data, which gets used by many other services, may be inaccurate w.r.t. the very basics.
Conclusion
I learned about DOI Content Negotiation and how bad it actually is for BibTeX. The databases (Springer, ACM, DBLP) are better, but none is perfect or even good enough, as the comparison table reveals. I guess I’ll end up doing a lot of manual work, although some is semi-automatable using BibLaTeX source maps (which are a story for another time).
-
The styling of tabs in the website theme I’m using clearly isn’t great if I have to point it out. I should fix that. ↩
-
The CSS of the website theme I’m using is such that bold doesn’t work together with
monospace
. I should fix that. But until then, just imagine@inproceedings
(and@InProceedings
) being bold in the table. ↩