Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
2d80ba5
Future Implementations for classes - Measure, Money, and Date (#258)
ngachchi Apr 22, 2025
c498731
update jenkins cache
mgrafu Apr 22, 2025
3b350a1
Merge branch 'main' into staging_hi_tn
mgrafu Apr 22, 2025
2e6d4e8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 22, 2025
9588396
Potential fix for code scanning alert no. 821: Unused local variable
mgrafu Apr 23, 2025
714c1cc
Hindi TN Future Implementations 2.0. - Fraction, Measure and Time (#310)
ngachchi Aug 27, 2025
68529fd
Hindi TN 2.0 - Telephone class integration from staging branch (#320)
shreeshd-tn Sep 9, 2025
eb7b3e6
Rebase Hindi TN update: Fix Jenkinsfile for CI (#325) (#331)
shreeshd-tn Oct 10, 2025
dd0b8b7
Hindi TN: Ordinal Implementation (#343)
shreeshd-tn Oct 17, 2025
96ba6a2
Hindi TN: Main to staging Fix + Cardinals (leading zero update) (#348)
shreeshd-tn Oct 22, 2025
83783a5
Merge branch 'main' into staging_hi_tn
mgrafu Oct 22, 2025
5e89a81
debug file issue
mgrafu Oct 22, 2025
262cd6e
debug ordinals error
mgrafu Oct 22, 2025
f364fbb
ci debug
mgrafu Oct 22, 2025
500e1fc
revert to original suffixes for ordinals
mgrafu Oct 22, 2025
aa22d29
CI fix: Missing init file (#350)
shreeshd-tn Oct 23, 2025
df5b3dc
HI TN: Staging branch cleanup for main merge (#355)
shreeshd-tn Oct 31, 2025
6dc912f
Cache date change (#356)
shreeshd-tn Oct 31, 2025
120db28
Hindi TN: Address Class (context + structural) (#359)
shreeshd-tn Jan 12, 2026
0279602
leading zero and formal/informal year fixes (#378)
shreeshd-tn Jan 20, 2026
f5902f8
Merge branch 'main' into staging_hi_tn
mgrafu Jan 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pipeline {
HY_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/03-12-24-0'
MR_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/03-12-24-1'
JA_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/10-17-24-1'
HI_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/10-31-25-0'
HI_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/01-16-26-0'
DEFAULT_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/06-08-23-0'
}
stages {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
36 changes: 36 additions & 0 deletions nemo_text_processing/text_normalization/hi/data/address/cities.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
अमरावती
ईटानगर
दिसपुर
पटना
रायपुर
पणजी
गांधीनगर
चंडीगढ़
शिमला
रांची
बेंगलुरु
तिरुवनंतपुरम
भोपाल
मुंबई
इम्फाल
शिलांग
आइजोल
कोहिमा
भुवनेश्वर
जयपुर
गंगटोक
चेन्नई
हैदराबाद
अगरतला
लखनऊ
देहरादून
कोलकाता
पोर्ट ब्लेयर
दमन
नई दिल्ली
श्रीनगर
जम्मू
लेह
कारगिल
कवरत्ती
पुडुचेरी
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
हाउस
प्लॉट
बूथ
अपार्टमेंट
फ्लैट
यूनिट
टावर
कॉम्प्लेक्स
मंजिल
फ्लोर
ब्लॉक
सेक्टर
फेज
रोड
सड़क
मार्ग
स्ट्रीट
गली
राजमार्ग
ड्राइव
डिस्ट्रिक्ट
बाईपास
हाइवे
पार्कवे
कॉलोनी
नगर
पार्क
एस्टेट
बोलवार्ड
मार्केट
सेंटर
पिन
गांव
पास
ब्रिगेड
नियर
स्क्वेर
मॉल
टॉवर
इंस्टीट्यूट
पिलर
मेट्रो
एवेन्यू
वेस्ट
सामने
पीछे
वीया
आर डी
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
street स्ट्रीट
southern सदर्न
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
A ए
B बी
C सी
D डी
E ई
F एफ
G जी
H एच
I आई
J जे
K के
L एल
M एम
N एन
O ओ
P पी
Q क्यू
R आर
S एस
T टी
U यू
V वी
W डब्ल्यू
X एक्स
Y वाई
Z ज़ेड
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- हाइफ़न
/ बटा
36 changes: 36 additions & 0 deletions nemo_text_processing/text_normalization/hi/data/address/states.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
आंध्र प्रदेश
अरुणाचल प्रदेश
असम
बिहार
छत्तीसगढ़
गोवा
गुजरात
हरियाणा
हिमाचल प्रदेश
झारखंड
कर्नाटक
केरल
मध्य प्रदेश
महाराष्ट्र
मणिपुर
मेघालय
मिज़ोरम
नागालैंड
ओडिशा
पंजाब
राजस्थान
सिक्किम
तमिलनाडु
तेलंगाना
त्रिपुरा
उत्तर प्रदेश
उत्तराखंड
पश्चिम बंगाल
अंडमान और निकोबार द्वीप समूह
चंडीगढ़
दादरा और नगर हवेली और दमन और दीव
दिल्ली
जम्मू और कश्मीर
लद्दाख
लक्षद्वीप
पुडुचेरी
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,3 @@ hp हॉर्सपॉवर
d दिन
month महीना
months महीने

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
yr वर्ष
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
0 ०
1 १
2 २
3 ३
4 ४
5 ५
6 ६
7 ७
8 ८
9 ९
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,20 @@
३री तीसरी
४था चौथा
४थी चौथी
५वां पाँचवां
५वीं पाँचवीं
६ठा छठा
६ठी छठी
१st फ़र्स्ट
२nd सेकंड
३rd थर्ड
४th फ़ोर्थ
५th फ़िफ्थ
६th सिक्स्थ
७th सेवंथ
८th एटथ
९th नाइंथ
१०th टेंथ
११th इलेवंथ
१२th ट्वेल्फ्थ
१३th थर्टींथ
१४th फोर्टींथ
१५th फिफ्टींथ
18 changes: 18 additions & 0 deletions nemo_text_processing/text_normalization/hi/graph_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,24 @@
HI_SADHE = "साढ़े" # half more (X.5)
HI_PAUNE = "पौने" # quarter less (0.75)

# Hindi decimal representations
HI_POINT_FIVE = ".५" # .5
HI_ONE_POINT_FIVE = "१.५" # 1.5
HI_TWO_POINT_FIVE = "२.५" # 2.5
HI_DECIMAL_25 = ".२५" # .25
HI_DECIMAL_75 = ".७५" # .75

# Symbol constants
HI_BY = "बाई"
LOWERCASE_X = "x"
UPPERCASE_X = "X"
ASTERISK = "*"
HYPHEN = "-"
SLASH = "/"
COMMA = ","
PERIOD = "."
HI_PERIOD = "।"

NEMO_LOWER = pynini.union(*string.ascii_lowercase).optimize()
NEMO_UPPER = pynini.union(*string.ascii_uppercase).optimize()
NEMO_ALPHA = pynini.union(NEMO_LOWER, NEMO_UPPER).optimize()
Expand Down
18 changes: 17 additions & 1 deletion nemo_text_processing/text_normalization/hi/taggers/cardinal.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import pynini
from pynini.lib import pynutil

from nemo_text_processing.text_normalization.hi.graph_utils import GraphFst, insert_space
from nemo_text_processing.text_normalization.hi.graph_utils import NEMO_HI_DIGIT, GraphFst, insert_space
from nemo_text_processing.text_normalization.hi.utils import get_abs_path


Expand All @@ -41,6 +41,11 @@
self.zero = zero
self.teens_and_ties = teens_and_ties

# Single digit graph for digit-by-digit reading
# e.g., "०७३" -> "शून्य सात तीन"
single_digit_graph = digit | zero
self.single_digits_graph = single_digit_graph + pynini.closure(insert_space + single_digit_graph)

def create_graph_suffix(digit_graph, suffix, zeros_counts):
zero = pynutil.add_weight(pynutil.delete("०"), -0.1)
if zeros_counts == 0:
Expand Down Expand Up @@ -304,7 +309,7 @@
graph_leading_zero = zero + insert_space + single_digit
graph_leading_zero = pynutil.add_weight(graph_leading_zero, 0.5)

final_graph = (

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'final_graph' is unnecessary as it is
redefined
before this value is used.
digit
| zero
| teens_and_ties
Expand All @@ -327,6 +332,17 @@
| graph_ten_shankhs
| graph_leading_zero
)
self.graph_without_leading_zeros = graph_without_leading_zeros.optimize()

# Handle numbers with leading zeros by reading digit-by-digit
# e.g., "०७३" -> "शून्य सात तीन", "००५" -> "शून्य शून्य पाँच"
cardinal_with_leading_zeros = pynini.compose(
pynini.accep("०") + pynini.closure(NEMO_HI_DIGIT), self.single_digits_graph
)
cardinal_with_leading_zeros = pynutil.add_weight(cardinal_with_leading_zeros, 0.5)

# Full graph including leading zeros - for standalone cardinal matching
final_graph = graph_without_leading_zeros | cardinal_with_leading_zeros

optional_minus_graph = pynini.closure(pynutil.insert("negative: ") + pynini.cross("-", "\"true\" "), 0, 1)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True):
super().__init__(name="decimal", kind="classify", deterministic=deterministic)

graph_digit = cardinal.digit | cardinal.zero
cardinal_graph = cardinal.final_graph
cardinal_graph = cardinal.graph_without_leading_zeros

self.graph = graph_digit + pynini.closure(insert_space + graph_digit).optimize()

Expand Down
Loading
Loading