In scientific research, sometimes when you see two studies on similar problems, you will want to see whether there is follow-up work to compare and evaluate the two. For example, I heard XLNet and RoBERTa They are two great pre training models, but I don't know which is stronger or weaker. Then we want to find papers that quote them at the same time and see how others evaluate and compare them? However, popular search engines such as Google Scholar do not directly provide the search function of finding papers that quote two specific articles a and B at the same time, so how can we achieve this?
stay StackExchange In the discussion, Gao Zan replied that the solution provided is to search the title of B from the citation of A, and then look at the results. Generally speaking, articles that quote B should include the title of B, so we can find some articles that meet the requirements. However, because such A search is not accurate, there will also be some articles containing some keywords very similar to B. generally speaking, it is not very accurate.
There is another way, we can get the citations of A and B, and then take the intersection. If there are few citations in the two articles, we can quickly determine the goal by manual comparison. However, if there are many citations in the two articles, they need to be sorted and compared manually, and the workload is very heavy.
As a programmer, such a large number of problems can't be done manually. We can solve this problem automatically by programming.
Here we use a good thing, the paper information provided by Semantic Scholar API . We can find relevant information from the database by providing the identification of the paper, including citations, and then we can realize the above idea.
Now, let's take XLNet(arXiv:1906.08237) and RoBERTa(arXiv:1907.11692) as examples to find papers that cite them at the same time:
import semanticscholar as sch # XLNet paper1 = sch.paper('arXiv:1906.08237', timeout=2)
paper1.keys()
dict_keys(['abstract', 'arxivId', 'authors', 'citationVelocity', 'citations', 'corpusId', 'doi', 'fieldsOfStudy', 'influentialCitationCount', 'isOpenAccess', 'isPublisherLicensed', 'is_open_access', 'is_publisher_licensed', 'numCitedBy', 'numCiting', 'paperId', 'references', 'title', 'topics', 'url', 'venue', 'year'])
As you can see, a lot of information related to the paper is included here. Here we mainly focus on quoting relevant content.
paper1["title"]
'XLNet: Generalized Autoregressive Pretraining for Language Understanding'
paper1["influentialCitationCount"]
462
len(paper1["citations"])
2616
paper1["citations"][0]
{'arxivId': None, 'authors': [{'authorId': '1519065273', 'name': 'Jia-ying Zhang'}, {'authorId': '3021771', 'name': 'Z. Zhang'}, {'authorId': '49724519', 'name': 'H. Zhang'}, {'authorId': '148431487', 'name': 'Z. Ma'}, {'authorId': '2075311652', 'name': 'Qi Ye'}, {'authorId': '145848393', 'name': 'P. He'}, {'authorId': '2613015', 'name': 'Yangming Zhou'}], 'doi': '10.1016/j.jbi.2020.103628', 'intent': [], 'isInfluential': False, 'paperId': '0031e24987fcdb8cfb0a2b842417c1d0a43a5d06', 'title': 'From electronic health records to terminology base: A novel knowledge base enrichment approach', 'url': 'https://www.semanticscholar.org/paper/0031e24987fcdb8cfb0a2b842417c1d0a43a5d06', 'venue': 'J. Biomed. Informatics', 'year': 2021}
# RoBERTa paper2 = sch.paper('arXiv:1907.11692', timeout=2) paper2["title"]
'RoBERTa: A Robustly Optimized BERT Pretraining Approach'
paper2["influentialCitationCount"]
1107
Now, with the information of the two articles, let's find the intersection of the articles that quote them.
Since both articles are influential and have many citations, we can also use the isInfluential tag to help us screen out influential articles for investigation.
co_citations = list(set(x['title'] for x in paper1['citations'] if x['isInfluential']) & set(x['title'] for x in paper2['citations'] if x['isInfluential'])) len(co_citations)
158
print("\n".join(co_citations[:10]))
Exploring Discourse Structures for Argument Impact Classification STEP: Sequence-to-Sequence Transformer Pre-training for Document Summarization Parameter-Efficient Transfer Learning with Diff Pruning Maximal Multiverse Learning for Promoting Cross-Task Generalization of Fine-Tuned Language Models Defending against Backdoor Attacks in Natural Language Generation Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation AR-LSAT: Investigating Analytical Reasoning of Text memeBot: Towards Automatic Image Meme Generation ARES: A Reading Comprehension Ensembling Service
Now that we have finished, we can study the problems we are interested in in in the above literature. I hope this little skill can help you in your work~