Python使用NLTK实现分词：将一段文本分成单独的词语

在Python中使用NLTK（Natural Language Toolkit）实现分词需要进行一些准备工作： 1. 安装NLTK库：可以使用`pip install nltk`命令来安装NLTK库。 2. 下载NLTK所需的数据集：NLTK提供了多个数据集，包括用于分词的数据集。在使用分词功能之前，我们需要下载其中一个数据集。 python import nltk nltk.download('punkt') 这将下载名为`punkt`的分词数据集。准备工作完成后，我们可以使用NLTK的`word_tokenize`方法实现分词。样例数据：我们将以一段文本作为样例数据。 python text = "Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and humans using natural language." 现在我们可以编写完整的代码来实现分词并打印输出结果。 python import nltk from nltk.tokenize import word_tokenize # 下载分词数据集 nltk.download('punkt') # 输入文本 text = "Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and humans using natural language." # 分词 tokens = word_tokenize(text) # 打印分词结果 print(tokens) 输出结果: ['Natural', 'Language', 'Processing', '(', 'NLP', ')', 'is', 'a', 'subfield', 'of', 'artificial', 'intelligence', 'that', 'focuses', 'on', 'the', 'interaction', 'between', 'computers', 'and', 'humans', 'using', 'natural', 'language', '.'] 以上代码详细介绍了在Python中使用NLTK实现分词的完整过程。