Python uses Gensim keyword extraction
Before using Gensim for keyword extraction, it is necessary to first set up the environment and prepare for it. The following are the steps to install the required class libraries through Anaconda:
1. Install Anaconda: Based on your operating system, from the Anaconda official website( https://www.anaconda.com/products/individual )Download the appropriate version of Anaconda and follow the installation guide for installation.
2. Create a virtual environment: Open Anaconda Prompt or command line terminal, run the following command to create a new virtual environment (named keyword_extraction here):
conda create -n keyword_extraction python=3.7
3. Activate virtual environment: Run the following command to activate the virtual environment:
conda activate keyword_extraction
4. Install Gensim: Run the following command to install Gensim:
conda install -c conda-forge gensim
5. Install other dependencies: If you need to use other class libraries, you can install them in a virtual environment as needed. For example, you can use the following command to install spaCy:
conda install -c conda-forge spacy
Download Dataset:
Gensim can use any text corpus for keyword extraction. Here we take the English Wikipedia corpus as an example. You can download it from the official website of Wikipedia( https://dumps.wikimedia.org/enwiki/ )Download the latest XML compressed file.
Sample source code:
The following is a complete example showing how to use Gensim for keyword extraction:
python
import logging
from gensim.corpora import WikiCorpus
from gensim.summarization import keywords
#Configure Logger
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
#Define the corpus path to be processed
wiki_corpus_path = 'path_to_wiki_corpus.xml.bz2'
#Build a WikiCorpus object for parsing the corpus
wiki_corpus = WikiCorpus(wiki_corpus_path)
#Extracting Documents from Corpus
documents = list(wiki_corpus.get_texts())
#Extract keywords from the first document
document = documents[0]
#Convert document to string format
document_text = ' '.join(document)
#Extract keywords using Gensim's keywords function, default to 10 keywords
extracted_keywords = keywords(document_text)
#Print extracted keywords
print(extracted_keywords)
Note that the 'path'_ To_ Wiki_ Corpus. xml. bz2 ` Replace with the path to the Wikipedia corpus you downloaded.
This example will output the keywords extracted from the first document. You can adjust the number of extracted keywords or use other text corpora as needed.