Python uses Fuzzywuzzy to calculate the similarity between strings, including Levenshtein distance, Jaro distance, etc
Environmental construction and preparation work:
1. Install Python: First, ensure that Python has been installed and can be accessed from the official website( https://www.python.org/downloads/ )Download and install the required Python version.
2. Install the Fuzzywuzzy class library: Fuzzywuzzy is a Python library used to calculate the similarity between strings. You can use the pip command to install the Fuzzywuzzy library, open a terminal or command prompt window, and run the following command:
pip install fuzzywuzzy
Dependent class libraries:
- fuzzywuzzy
- Levenshtein
Data sample (sample dataset):
For string similarity calculation, the following data examples can be used:
python
string1 = "Hello World"
string2 = "Hello World!"
string3 = "Hello Python"
The complete sample code is as follows:
python
from fuzzywuzzy import fuzz, process
from fuzzywuzzy import fuzz
from Levenshtein import distance
#Sample data
string1 = "Hello World"
string2 = "Hello World!"
string3 = "Hello Python"
#Calculate the similarity between strings
#Calculate similarity using the ratio function of the fuzzy module
ratio = fuzz.ratio(string1, string2)
print(f"Ratio similarity between '{string1}' and '{string2}' is: {ratio}")
#Partial using the fuzzy module_ Ratio function calculates similarity
partial_ratio = fuzz.partial_ratio(string1, string2)
print(
f"Partial ratio similarity between '{string1}' and '{string2}' is: {partial_ratio}"
)
#Using the token of the fuzzy module_ Sort_ Ratio function calculates similarity
token_sort_ratio = fuzz.token_sort_ratio(string1, string2)
print(
f"Token sort ratio similarity between '{string1}' and '{string2}' is: {token_sort_ratio}"
)
#Using the token of the fuzzy module_ Set_ Ratio function calculates similarity
token_set_ratio = fuzz.token_set_ratio(string1, string2)
print(
f"Token set ratio similarity between '{string1}' and '{string2}' is: {token_set_ratio}"
)
#Using the Levenshtein class library to calculate the Levenshtein distance
levenshtein_distance = distance(string1, string3)
print(
f"Levenshtein distance between '{string1}' and '{string3}' is: {levenshtein_distance}"
)
Code parsing:
1. Import functions from the fuzzy and process modules to calculate string similarity.
2. Import the distance function to calculate the Levenshtein distance between strings.
3. Define example data: string1, string2, string3.
4. Use the fuzz.ratio function to calculate the similarity between two strings.
5. Using fuzz.partial_ The ratio function calculates the partial similarity between two strings.
6. Using fuzz.token_ Sort_ The ratio function calculates the similarity between sorted words in two strings.
7. Using fuzz.token_ Set_ The ratio function calculates the similarity between the sets of words in two strings.
8. Use the Levenshtein. distance function to calculate the Levenshtein distance between strings.
Summary:
-The Fuzzywuzzy class library provides multiple methods for calculating the similarity between strings, including ratio, partial_ Ratio, token_ Sort_ Ratio and token_ Set_ Ratio, etc.
-The Levenshtein class library provides a function for calculating the Levenshtein distance between strings.
-Choose an appropriate method for string similarity calculation based on specific needs, and choose to use the Fuzzywuzzy or Levenshtein class library to calculate similarity according to actual situations.