How to search for papers that quote two specific articles at the same time?

In scientific research, sometimes when you see two studies on similar problems, you will want to see whether there is follow-up work to compare and evaluate the two. For example, I heard XLNet and RoBERTa They are two great pre training models, but I don't know which is stronger or weaker. Then we want to find papers that quote them at the same time and see how others evaluate and compare them? However, popular search engines such as Google Scholar do not directly provide the search function of finding papers that quote two specific articles a and B at the same time, so how can we achieve this?

stay StackExchange In the discussion, Gao Zan replied that the solution provided is to search the title of B from the citation of A, and then look at the results. Generally speaking, articles that quote B should include the title of B, so we can find some articles that meet the requirements. However, because such A search is not accurate, there will also be some articles containing some keywords very similar to B. generally speaking, it is not very accurate.

There is another way, we can get the citations of A and B, and then take the intersection. If there are few citations in the two articles, we can quickly determine the goal by manual comparison. However, if there are many citations in the two articles, they need to be sorted and compared manually, and the workload is very heavy.

As a programmer, such a large number of problems can't be done manually. We can solve this problem automatically by programming.

Here we use a good thing, the paper information provided by Semantic Scholar API . We can find relevant information from the database by providing the identification of the paper, including citations, and then we can realize the above idea.

Now, let's take XLNet(arXiv:1906.08237) and RoBERTa(arXiv:1907.11692) as examples to find papers that cite them at the same time:

import semanticscholar as sch
# XLNet
paper1 = sch.paper('arXiv:1906.08237', timeout=2)

paper1.keys()

dict_keys(['abstract', 'arxivId', 'authors', 'citationVelocity', 'citations', 'corpusId', 'doi', 'fieldsOfStudy', 'influentialCitationCount', 'isOpenAccess', 'isPublisherLicensed', 'is_open_access', 'is_publisher_licensed', 'numCitedBy', 'numCiting', 'paperId', 'references', 'title', 'topics', 'url', 'venue', 'year'])

As you can see, a lot of information related to the paper is included here. Here we mainly focus on quoting relevant content.

paper1["title"]

'XLNet: Generalized Autoregressive Pretraining for Language Understanding'

paper1["influentialCitationCount"]

len(paper1["citations"])

paper1["citations"][0]

{'arxivId': None,
 'authors': [{'authorId': '1519065273', 'name': 'Jia-ying Zhang'},
  {'authorId': '3021771', 'name': 'Z. Zhang'},
  {'authorId': '49724519', 'name': 'H. Zhang'},
  {'authorId': '148431487', 'name': 'Z. Ma'},
  {'authorId': '2075311652', 'name': 'Qi Ye'},
  {'authorId': '145848393', 'name': 'P. He'},
  {'authorId': '2613015', 'name': 'Yangming Zhou'}],
 'doi': '10.1016/j.jbi.2020.103628',
 'intent': [],
 'isInfluential': False,
 'paperId': '0031e24987fcdb8cfb0a2b842417c1d0a43a5d06',
 'title': 'From electronic health records to terminology base: A novel knowledge base enrichment approach',
 'url': 'https://www.semanticscholar.org/paper/0031e24987fcdb8cfb0a2b842417c1d0a43a5d06',
 'venue': 'J. Biomed. Informatics',
 'year': 2021}

# RoBERTa
paper2 = sch.paper('arXiv:1907.11692', timeout=2)
paper2["title"]

'RoBERTa: A Robustly Optimized BERT Pretraining Approach'

paper2["influentialCitationCount"]

Now, with the information of the two articles, let's find the intersection of the articles that quote them.

Since both articles are influential and have many citations, we can also use the isInfluential tag to help us screen out influential articles for investigation.

co_citations = list(set(x['title'] for x in paper1['citations'] if x['isInfluential']) & set(x['title'] for x in paper2['citations'] if x['isInfluential']))
len(co_citations)

print("\n".join(co_citations[:10]))

Exploring Discourse Structures for Argument Impact Classification
STEP: Sequence-to-Sequence Transformer Pre-training for Document Summarization
Parameter-Efficient Transfer Learning with Diff Pruning
Maximal Multiverse Learning for Promoting Cross-Task Generalization of Fine-Tuned Language Models
Defending against Backdoor Attacks in Natural Language Generation
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
AR-LSAT: Investigating Analytical Reasoning of Text
memeBot: Towards Automatic Image Meme Generation
ARES: A Reading Comprehension Ensembling Service

Now that we have finished, we can study the problems we are interested in in in the above literature. I hope this little skill can help you in your work~

Added by RHolm on Wed, 08 Dec 2021 01:09:55 +0200

Programming VIP

How to search for papers that quote two specific articles at the same time?

Popular Keywords