Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
66ccd65
add quiz response
juliarodina Oct 29, 2018
aeaf184
upd
juliarodina Oct 30, 2018
977901d
upd
juliarodina Oct 30, 2018
3e4e16e
add input data
juliarodina Oct 30, 2018
db8f28d
add results for pragmatic segmenter
juliarodina Oct 30, 2018
73a9c37
upd
juliarodina Oct 30, 2018
8128c66
nltk done
juliarodina Nov 2, 2018
cacb067
segmentation done
juliarodina Nov 2, 2018
75dbfee
upd
juliarodina Nov 3, 2018
61834e4
add maxmatch script
juliarodina Nov 3, 2018
f854387
evaluation done
juliarodina Nov 3, 2018
cdab3ca
final changes
juliarodina Nov 3, 2018
8a51e63
hfst practical
juliarodina Nov 13, 2018
b17e3d2
utf8
juliarodina Nov 13, 2018
b0e3122
utf8
juliarodina Nov 13, 2018
df35254
Update segmentation-response.md
juliarodina Nov 13, 2018
ac0df22
Update 2018-komp-ling/practicals/transliteration/transliteration-resp…
juliarodina Nov 13, 2018
e90008f
Merge branch 'master' of https://github.com/juliarodina/ftyers.github.io
juliarodina Nov 13, 2018
e0eedb5
Update hfst-response.md
juliarodina Nov 18, 2018
fb26bbf
Delete hfst-response.txt
juliarodina Nov 18, 2018
4aa553a
Update hfst-response.md
juliarodina Nov 19, 2018
537b20c
Update hfst-response.md
juliarodina Nov 19, 2018
6006efe
Rename hfst-response.md to HFST-RESPONSE.md
juliarodina Nov 19, 2018
ea1b086
transliteration done
juliarodina Nov 20, 2018
552e319
upd response
juliarodina Nov 20, 2018
4200c87
Update transliteration-response.md
juliarodina Nov 20, 2018
edcdff3
quiz2 response
juliarodina Nov 21, 2018
fdfa765
upd response
juliarodina Nov 21, 2018
633d52f
upd response
juliarodina Nov 21, 2018
ab2223e
upd quiz2
juliarodina Nov 21, 2018
c5a4825
upd
juliarodina Nov 21, 2018
f34b6ef
Update quiz-02_response.md
juliarodina Nov 21, 2018
443be2b
Update quiz-02_response.md
juliarodina Nov 21, 2018
0eb9406
Update quiz-02_response.md
juliarodina Nov 21, 2018
8915226
Update quiz-02_response.md
juliarodina Nov 21, 2018
0a7e995
Update quiz-02_response.md
juliarodina Nov 21, 2018
c75de81
Update quiz-02_response.md
juliarodina Nov 21, 2018
f846908
Update quiz-02_response.md
juliarodina Nov 21, 2018
88ab46c
upd
juliarodina Nov 22, 2018
6e4fece
upd
juliarodina Nov 22, 2018
c7acb3d
Update quiz-02_response.md
juliarodina Nov 22, 2018
478c2ea
Update quiz-02_response.md
juliarodina Nov 22, 2018
c7aec15
Update quiz-02_response.md
juliarodina Nov 22, 2018
d27813b
Update quiz-02_response.md
juliarodina Nov 22, 2018
da1945b
unigram model response
juliarodina Dec 10, 2018
7940a9c
Update report.md
juliarodina Dec 10, 2018
eb5bc64
Update report.md
juliarodina Dec 10, 2018
bd1b4d4
Update train.py
juliarodina Dec 10, 2018
8fa13a7
Update report.md
juliarodina Dec 11, 2018
a87f1d3
Create quiz-03_response.md
juliarodina Dec 12, 2018
97e9aca
Update quiz-03_response.md
juliarodina Dec 12, 2018
dc5d913
Update quiz-03_response.md
juliarodina Dec 12, 2018
6daac0c
rename
juliarodina Mar 26, 2019
74e1f25
pr 4
juliarodina Mar 30, 2019
db9f6bf
RENAME RESPONSES PLS JUST GIVE ME MY GRADES
juliarodina Apr 2, 2019
94dbeb8
delete xrenner i have pr4 for both tracks and im not going to do xrenner
juliarodina Apr 2, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
235 changes: 235 additions & 0 deletions 2018-komp-ling/practicals/hfst-response/HFST-RESPONSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
1. A simple lexical transducer
"Now, go back to your chv.lexc file and add some more stems, for example пахча "сад, garden", хула "город, city" and канаш "совет, council".
Then recompile and rerun the other steps up to visualisation."
&
2. Continuation classes
"And now run it through hfst-fst2txt to visualise the resulting transducer."

> see "chv.lexc.png"

3. Phonological rules
"Now try out the other arrows with your rule, recompile and look at the output."

1) output for =>
канаш<n><ins>:канашпа
канаш<n><ins>:канашпе
канаш<n><pl><ins>:канашсемпе
пакча<n><ins>:пакчапа
пакча<n><ins>:пакчапе
пакча<n><pl><ins>:пакчасемпе
урам<n><ins>:урампа
урам<n><ins>:урампе
урам<n><pl><ins>:урамсемпе
хула<n><ins>:хулапа
хула<n><ins>:хулапе
хула<n><pl><ins>:хуласемпе

In this case we have no (2) constrain, so in the context of the rule %{A%} correspond to either a or e.

2) output for <=
канаш<n><ins>:канашпа
канаш<n><pl><ins>:канашсемпа
канаш<n><pl><ins>:канашсемпе
пакча<n><ins>:пакчапа
пакча<n><pl><ins>:пакчасемпа
пакча<n><pl><ins>:пакчасемпе
урам<n><ins>:урампа
урам<n><pl><ins>:урамсемпа
урам<n><pl><ins>:урамсемпе
хула<n><ins>:хулапа
хула<n><pl><ins>:хуласемпа
хула<n><pl><ins>:хуласемпе

In this case we have no (1) constrain, so %{A%} correspond to either a or e out of context.

3) output for /<=
канаш<n><ins>:канашпе
канаш<n><pl><ins>:канашсемпа
канаш<n><pl><ins>:канашсемпе
пакча<n><ins>:пакчапе
пакча<n><pl><ins>:пакчасемпа
пакча<n><pl><ins>:пакчасемпе
урам<n><ins>:урампе
урам<n><pl><ins>:урамсемпа
урам<n><pl><ins>:урамсемпе
хула<n><ins>:хулапе
хула<n><pl><ins>:хуласемпа
хула<n><pl><ins>:хуласемпе

As said in the interpretation of the rule type, %{A%} never correspond to a in the context
and correspond to either a or e out of context.

4. Rule interactions

Added rules for %{м%} and %{A%}:0 after vowel in .twol:

"Non surface {A} after vowel"
%{A%}:0 <=> [ BackVow: | FrontVow: ] %>: _ ;

"Non surface {м} in plural genitive"
%{м%}:0 <=> _ %>: %{A%}: н ;

Transducer is on the picture "chv.gen.png", minimazed by command:
$ hfst-minimise chv.gen.hfst | hfst-fst2txt| python3 att2dot.py | dot -Tpng -o chv.gen.png.

"What does minimisation do?" It makes a transducer which is equivalent to original one, but with minimum number of states.

5. More on morphotactics
"What difference do you note?" At first the values of the flags are transferred to the next states and only then the prefixes themselves.

6. Productive derivation
Command to get .mor file: hfst-invert chv.gen.hfst -o chv.mor.hfst

7. Lexicon construction
Fisrt 10 most frequient words from wiki texts:
33356 Юханшыв
30359 кeрет
30343 шыв
29671 Шыв
27039 км
26485 бассейнe
25745 Раccей
25276 юханшыв
22810 хыпарeпе
It seems there were a lot of texts about rivers..........

8. Evaluation
Coverage: also 0.12% :(

9. Weighting
$ echo "область" | hfst-lookup -qp chv.surweights.hfst
область область 11,377200

$ echo "облаc" | hfst-lookup -qp chv.surweights.hfst
обла? обла? 10,050300

All the files I've got doing this practical are in this folder. As I understood there should be a description of changes I've done
in the files so I'll comment some parts of chv.lexc and chv.twol.

I. chv.lexc

# ADDED SOME NECESSERY MULTICHAR SYMBOLS
Multichar_Symbols

%<n%> ! Имя существительное
%<pl%> ! Множественное число
%<nom%> ! Именительный падеж
%<ins%> ! Творительный падеж
%<gen%> ! Родительный падеж

%<num%> ! Число

%{A%} ! Архифонема [а] или [е]
%{Ă%} ! Архифонема [ӑ] или [ӗ] или 0
%{м%} ! Архифонема [м] или 0
%{с%} ! Несонорные согласные
%{л%} ! -н, -л, или -р
%{э%} ! Передние гласные
%{а%} ! Задние гласные

%{ъ%} ! Для заимствованных слов

%<der_лӑх%> ! Суффикс -лӐх

%> ! Граница морфемы

# ADDED LEXICON GUESSER TO THE ROOT
LEXICON Root

Nouns ;
Guesser ;

# ADDED GENETIVE AND NOMINATIVE TO THE CASES LEXICON
LEXICON CASES

%<nom%>:%> # ;
%<ins%>:%>п%{A%} # ;
%<gen%>:%>%{Ă%}н # ;

# EDITES PLURAL RULE TO WORK WITH GENITIVE PLURAL
LEXICON PLURAL

CASES ;
%<pl%>:%>се%{м%} CASES ;

# ADDED LEXICONS SUBST AND DER-N TO WORK WITH PRODUCTIVE DERIVATION
LEXICON SUBST

PLURAL ;

LEXICON DER-N

%<der_лӑх%>:%>л%{Ă%}х SUBST "weight: 1.0" ;

# ADDED THEM INTO LEXICON N
LEXICON N

%<n%>: PLURAL ;
%<n%>: SUBST ;
%<n%>: DER-N ;

# *ADDED THE PART TO HANDLE WITH NUMERAL EXPRESSIONS*

# ADDED LEXICON N/СТЬ TO WORK WITH WEIGHTING OF DIFFERENT SURFACE FORMS
LEXICON N/сть

%<n%>:ҫ SUBST "weight: 0.5" ;
%<n%>%<nom%>:сть # "weight: 1.0" ;

# ADDED SOME WORDS TO NOUNS
LEXICON Nouns

урам:урам N ; ! "улица"
пакча:пакча N ; ! "сад"
хула:хула N ; ! "город"
канаш:канаш N ; ! "совет"
тӗс:тӗс N ; ! "вид"
патша:патша N ; ! "царь"
куҫ:куҫ N ; ! "глаз"
патшалӑх:патшалӑх N ; ! "государство"
специалист:специалист%{ъ%} N ; ! "специалист"

II. chv.twol

# ADDED ALL POSSIBLE OUTPUTS FOR NEW ARCHIFONEMES
Alphabet
а ӑ е ё ӗ и о у ӳ ы э ю я б в г д ж з к л м н п р с ҫ т ф х ц ч ш щ й ь ъ
А Ӑ Е Ё Ӗ И О У Ӳ Ы Э Ю Я Б В Г Д Ж З К Л М Н П Р С Ҫ Т Ф Х Ц Ч Ш Щ Й Ь Ъ
%{A%}:а %{A%}:е
%{Ă%}:ӑ %{Ă%}:ӗ %{Ă%}:0
%{м%}:м %{м%}:0
%{ъ%}:0
%{э%}:0 %{л%}:0 %{с%}:0 %{а%}:0
;

# ADDED SOME SPECIAL ARCHIFONEMES TO SETS
Sets

BackVow = ӑ а ы о у я ё ю %{ъ%} %{а%} ;
FrontVow = ӗ э и ӳ %{э%} ;
Cns = б в г д ж з к л м н п р с ҫ т ф х ц ч ш щ й ь ъ %{л%} %{с%};
ArchiCns = %{м%} ;

# ADDED SOME RULES (SEE ABOVE)
Rules

"Remove morpheme boundary"
%>:0 <=> _ ;

"Back vowel harmony for archiphoneme {A}"
%{A%}:а <=> BackVow: [ Cns: | %>: ]+ _ ;

"Non surface {Ă} after vowel"
%{Ă%}:0 <=> [ BackVow: | FrontVow: ] %>: _ ;

"Non surface {Ă} in plural genitive"
%{Ă%}:0 <=> %{м%}: %>: _ н ;

"Non surface {м} in plural genitive"
%{м%}:0 <=> _ %>: %{Ă%}: н ;

"Back vowel harmony for archiphoneme {Ă}"
%{Ă%}:ӑ <=> BackVow: [ ArchiCns: | Cns: | %>: ]+ _ ;
except
%{м%}: %>: _ н ;
BackVow: %>: _ ;
4 changes: 4 additions & 0 deletions 2018-komp-ling/practicals/hfst-response/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
all:
hfst-lexc chv.lexc -o chv.lexc.hfst
hfst-twolc chv.twol -o chv.twol.hfst
hfst-compose-intersect -1 chv.lexc.hfst -2 chv.twol.hfst -o chv.gen.hfst
Loading