Python使用spaCy实现命名实体识别

Python 自然语言处理 spaCy

环境准备： 1. 确保已经安装了Python和pip。 2. 使用pip安装spaCy：`pip install spacy`。 3. 下载spaCy的英语模型：`python -m spacy download en`。依赖的类库： - spaCy：用于进行命名实体识别。 - pandas：用于处理和展示数据。数据集： spaCy已经内置了一些数据集，我们使用其中的"en_core_web_sm"数据集来进行命名实体识别。样例数据：我们使用以下句子作为样例数据进行命名实体识别： "Apple is looking at buying U.K. startup for $1 billion" 完整源码如下： python import spacy import pandas as pd def main(): # 加载英语模型 nlp = spacy.load("en_core_web_sm") # 定义待识别的句子 sentence = "Apple is looking at buying U.K. startup for $1 billion" # 对句子进行命名实体识别 doc = nlp(sentence) # 提取命名实体的标签和文本 entities = [(entity.label_, entity.text) for entity in doc.ents] # 将结果转为DataFrame并打印 df = pd.DataFrame(entities, columns=["Label", "Text"]) print(df) if __name__ == "__main__": main() 运行代码后，输出结果为： Label Text 0 ORG Apple 1 GPE U.K. 2 ORG startup 3 MISC $1 billion 注意：在使用spaCy进行命名实体识别时，需要安装对应的语言模型。上述源码中使用的是"en_core_web_sm"模型，可以根据需要选择其他语言模型，例如"en_core_web_md"或"en_core_web_lg"，然后使用对应的模型名称进行加载。

Read in English