linguistic research / translation / editing

[This post has been updated, below, and see here most recently for the journal’s response.]

This post provides an update on the AI peer review incident reported previously on this site, in December 2023.

The article described here had been commissioned for a special issue of mediAzioni journal (Università di Bologna – Forlì), to be published in late 2024 as conference proceedings of the TaCo conference, on “taboo” in language, held in Rome in September 2022. The assessment of the paper from the editors was positive: “Both referees have indicated that the paper is suitable for publication, although significant changes should be made.”

After discovering that generative AI had been used to assess my paper, I sent a detailed response to the guest editors of the special issue. This included many concrete questions for the reviewers to answer; a request that my response be shared with the scientific committee; and a request that a revised version of my paper undergo a replacement round of human review.

Upon reflection, I then escalated the matter, appealing to the editors in chief of mediAzioni journal to conduct an official investigation involving all members of the scientific board. On 21 December 2023, the guest editors and the mediAzioni editors indicated that a “thorough investigation” and a “thorough evaluation” would be conducted after the holiday break.

On 11 January 2024, the editors responded denying the allegations, without providing any substantiation for their position:

Dear Nicholas Lo Vecchio,

both the Special Issue guest editors and the mediAzioni editorial team have now carefully assessed your claims about AI having allegedly been used to create the reviews for the paper you have submitted and have found them unsubstantiated and without merit. We have no reason to doubt neither the authenticity of the reviews nor the ethical conduct of the two reviewers. Therefore, we stand by our original message sent to you on December 19th, 2023.

Kind regards,
The Special Issue guest editors
The mediAzioni editors-in-chief

I withdrew my article and informed the editors that the matter would be considered open and unresolved until the mediAzioni journal provided an adequate response. I submitted a list of additional questions about the investigation, again asking the editors to confront the reviewers directly and to involve all members of the scientific committee. The editors did not respond, either to that email or to any of the questions raised in previous communications.

As of this morning, I have appealed by email to all members of the mediAzioni scientific committee (Advisory Board) in the hopes that they will take this matter seriously and seek to establish public accountability for the individuals responsible.

I have also contacted all participants of the 2022 TaCo conference, urging other submitting authors to carefully examine their own review reports to determine whether similar AI text patterns are detected.

In addition to the ethical breach, there are legal concerns about confidentiality, as any unauthorized processing of texts in a commercial AI platform could have led to the theft and undue monetarization of intellectual property. The automated analyses themselves raise various other issues.

Here, in the spirit of transparency, I am now releasing the review reports. Two versions are available:

The most up-to-date version of my withdrawn article is available in prepublication form on the main page of this website: “Using Translation to Chart the Early Spread of GAY Terminology” [link].

Nicholas Lo Vecchio
31 January 2024

New update with ChatGPT tests

To further update on the matter of AI use in the peer review of my article, I have as of today received no response from the members of the mediAzioni Advisory Board, either individually or collectively. I had requested some sort of formal response by Monday 5 February end of day.

In the continued spirit of transparency, I am now releasing a file with simulations run in ChatGPT that convincingly demonstrate that this was the generative AI platform most likely used to produce the review reports for my article.

Among the many text patterns recurring in the review reports and the ChatGPT output are the following: “the author should consider,” “there are instances,” “to improve (the) clarity,” “benefit from,” “help readers,” “reader’s understanding,” “unique perspective,” “existing literature,” “issue(s) at hand,” “fall short,” “fully grasp,” “fully explore,” “enhance,” “enrich,” “solidify,” “lack,” “findings,” “elements,” “dimensions,” “challenging,” “roadmap,” “alternative interpretations.” In reviews that have not a single interesting or engaging thing to say about my arguments, the most appalling examples are those involving text cohesion: “seamless,” “transition,” “gaps,” “progression,” “coherence,” “cohesion,” “abrupt,” “disjointed.” No other peer review report I have received has exhibited similar text patterns.

The scholarly and press dialogue on the use of AI in peer review has so far been skewed toward the hard sciences; see a bibliography here. A more critical discussion that accounts for its ill-suited use in the humanities is urgently needed.

The concrete outcome I would like is for the journal editors and board to assume their own responsibility by acknowledging the error, and for the individuals responsible to apologize.

Nicholas Lo Vecchio
7 February 2024

An article describing AI text patterns

I was interested to read Liang et al.’s recent paper “Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews” (not peer reviewed; preprint posted to arXiv, arXiv:2403.07183, 11 March 2024) and I look forward to similar research that concretely describes the text patterns in AI-generated peer reviews. Rather than individuating actual peer review reports where AI was used, this article attempts to probabilistically identify text patterns “at scale” across a conference peer review corpus.

Most interesting to me are the word lists/clouds (pp. 15 and 16) of the top 100 adjectives and adverbs “disproportionately used more frequently by AI” in the corpus studied (with emphasis here on in the corpus studied). The review reports I received from mediAzioni overlap with Liang et al.’s word lists. Here is a non-exhaustive sample: comprehensive (4 tokens), additionally (4), broader (3), potentially (2), valuable (1), unique (1), substantial (1), particularly (1), notably (1); and note also: robust (3 tokens; cf. robustly in Liang), seamless(ly) (2; cf. seamlessly in Liang), thorough (2; cf. thorougly in Liang), credibility (1; cf. credible in Liang), critical (1; cf. critically in Liang). No other peer review report I have received exhibits such patterns. Meanwhile, because the two review reports do not make positive or constructive remarks, it does not come as a surprise that some of the disproportionately used positive terms cited in the paper are absent from my reports (such as commendable, innovative, meticulous, valuable).

The Liang et al. paper makes inferences “at scale” indicating that generative AI is being used in peer review. This involves extrapolating out from known AI text patterns, just as AI detectors do. Similarly to AI detectors, Liang et al.’s methods can never reach 100 percent certainty of what is AI or human produced, but can only establish relative levels of probability based on a given corpus. If we acknowledge (as we must) that generative AI definitely is being used to produce peer review reports, we must logically then ask questions about the humans who are doing so and under what conditions. It is not enough to seek evidence “at scale,” which elides human agency and suggests that the robot takeover is just somehow happening, independently of the individual humans enacting it – unethically so, in the case of undisclosed and unconsented AI use in peer review. AI in peer review is not something that just “is happening”; individual humans are responsible for it, with the results evidenced in concrete peer review reports. Individuals need to be held accountable, which is why it is important to look at the actual AI-generated texts themselves – hence why I have published the reports above.

As tech journalist Emanuel Maiberg noted in the 404 Media podcast on this topic (tying in to Maiberg’s 404 article “ChatGPT Looms Over the Peer-Review Crisis,” 2 April 2024), arXiv as a preprint repository itself represents a development in scientific knowledge creation: arXiv papers are not peer reviewed (yet, in principle), though they may be widely cited and engaged with (such as I am doing here and Maiberg did in the article). This is a reminder that “science” does not refer to one thing, and non-peer-reviewed science as well as junk science from paper mills and the like compete for public, or specialist, attention alongside peer-reviewed work whose quality itself varies greatly. Important to keep in mind when hearing pieties about the sanctity of peer review: in the practical terms of public reach, the paradigm of knowledge gatekeeping has already shifted.

Nicholas Lo Vecchio
6 April 2024