-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathStemming_Lemmatization.py
More file actions
74 lines (62 loc) · 3.07 KB
/
Stemming_Lemmatization.py
File metadata and controls
74 lines (62 loc) · 3.07 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# -*- coding: utf-8 -*-
"""
Created on Sat Sep 25 20:26:40 2021
@author: Aditi.Dhamat
"""
#numpy does vectorized calculations and is faster than normal
# python operations
import numpy as np
#%timeit list1 = range(1,1000)== 1m loops(millisec)
#%timeit list2 = np.arange(1,1000)= 1lakh loops (microsecs)
#NLTK
import nltk
#nltk.download()
from nltk.stem import PorterStemmer
paragraph = """Thank you all so very much.
Thank you to the Academy. Thank you to all of you in
this room. I have to congratulate the other
incredible nominees this year. The Revenant was the
product of the tireless efforts of an unbelievable
cast and crew. First off, to my brother in this
endeavor, Mr. Tom Hardy. Tom, your talent on screen
can only be surpassed by your friendship off screen.
Thank you for creating a transcendent cinematic experience.
Thank you to everybody at Fox and New Regency …
my entire team. I have to thank everyone from the
very onset of my career … To my parents; none of this
would be possible without you. And to my friends,
I love you dearly; you know who you are.And lastly,
I just want to say this: Making The Revenant was
about man's relationship to the natural world.
A world that we collectively felt in 2015 as the
hottest year in recorded history. Our production
needed to move to the southern tip of this planet
just to be able to find snow. Climate change is real,
it is happening right now. It is the most urgent
threat facing our entire species, and we need to work
collectively together and stop procrastinating. We
need to support leaders around the world who do not
speak for the big polluters, but who speak for all of
humanity, for the indigenous people of the world, for
the billions and billions of underprivileged people
out there who would be most affected by this. For our
children’s children, and for those people out there
whose voices have been drowned out by the politics of
greed. I thank you all for this amazing award tonight.
Let us not take this planet for granted. I do not
take tonight for granted. Thank you so very much."""
sentences = nltk.sent_tokenize(paragraph)
words = nltk.word_tokenize(paragraph)
#Stemming
stemmer = PorterStemmer()
stem_op = []
for i in range(len(words)):
words_stemed = stemmer.stem(words[i])
stem_op.append(words_stemed)
#Lemmatization
from nltk.stem import WordNetLemmatizer
lemma = WordNetLemmatizer()
lem_words = []
for i in range(len(words)):
words_lem = lemma.lemmatize(words[i])
lem_words.append(words_lem)