PaECTER: Patent-level Representation Learning using Citation-informed Transformers
CoRR(2024)
摘要
PaECTER is a publicly available, open-source document-level encoder specific
for patents. We fine-tune BERT for Patents with examiner-added citation
information to generate numerical representations for patent documents. PaECTER
performs better in similarity tasks than current state-of-the-art models used
in the patent domain. More specifically, our model outperforms the next-best
patent specific pre-trained language model (BERT for Patents) on our patent
citation prediction test dataset on two different rank evaluation metrics.
PaECTER predicts at least one most similar patent at a rank of 1.32 on average
when compared against 25 irrelevant patents. Numerical representations
generated by PaECTER from patent text can be used for downstream tasks such as
classification, tracing knowledge flows, or semantic similarity search.
Semantic similarity search is especially relevant in the context of prior art
search for both inventors and patent examiners. PaECTER is available on Hugging
Face.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要