[WIP] Refactor madx_run_clm.py by haileyschoelkopf · Pull Request #27 · bigscience-workshop/multilingual-modeling

haileyschoelkopf · 2022-06-27T14:08:17Z

Current changes: just some unused / commented out code from madx_run_clm.py. There is more, but I was not certain why certain parts are commented out.

We'll need to refactor the script as well once we add new ft strategies.

I also wonder whether it would be helpful to turn language experiments into a single packaged script (train tokenizer + adapt model + possibly run eval?) So that it is easier to onboard and have the others run experiments.

yongzx · 2022-06-28T05:34:13Z

I also wonder whether it would be helpful to turn language experiments into a single packaged script (train tokenizer + adapt model + possibly run eval?) So that it is easier to onboard and have the others run experiments.

Might be in the future, but at least during these sprints, let's keep it separate.

lintangsutawika · 2022-06-29T14:03:22Z

We might also want to reconfigure the file structure?

My thoughts would be something like:

multilingual-modeling/
- lang-adapt/
    - README.md
    - scripts/
    - finetune/
        - *.py
    -  *.py
- evaluation/
    - eval_xnli/
    - eval_exp_sentence_retreival_eval/

yongzx · 2022-06-29T14:38:08Z

Yea the structure is a mess right now. There's too many duplication (e.g., on the eval side, we actually don't need eval_xnli) due to legacy codes before.

I am working on it right now.

yongzx · 2022-06-29T14:51:53Z

1fb6504

multilingual-modeling/
- lang-adapt/
    - README.md
    - scripts/
    -  *.py
- evaluation/
    - wikiann/  #scripts
    - xnli/  #scripts
    - eval.py
    - README.md
- exp_sentence_retreival_eval/

for now.

@lintangsutawika What do you have in mind in the finetune/ folder?

haileyschoelkopf · 2022-07-01T12:34:19Z

this makes sense to me, but I had problems downloading XNLI when there was a folder called "xnli" in the same path. Renaming to anything else (xnli_scripts, etc) fixes this problem.

yongzx · 2022-07-01T13:50:53Z

@haileyschoelkopf Fixed by a8486d4 (using scripts_*) instead.

lintangsutawika · 2022-07-01T16:14:46Z

@yongzx I'm not sure. I think parameter-efficient finetuning should be included in lang-adapt/

remove obsolete code

5ff8352

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Refactor madx_run_clm.py#27

[WIP] Refactor madx_run_clm.py#27
haileyschoelkopf wants to merge 1 commit into
masterfrom
hailey/cleanup

haileyschoelkopf commented Jun 27, 2022

Uh oh!

yongzx commented Jun 28, 2022

Uh oh!

lintangsutawika commented Jun 29, 2022

Uh oh!

yongzx commented Jun 29, 2022

Uh oh!

yongzx commented Jun 29, 2022 •

edited

Loading

Uh oh!

haileyschoelkopf commented Jul 1, 2022

Uh oh!

yongzx commented Jul 1, 2022 •

edited

Loading

Uh oh!

lintangsutawika commented Jul 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

haileyschoelkopf commented Jun 27, 2022

Uh oh!

yongzx commented Jun 28, 2022

Uh oh!

lintangsutawika commented Jun 29, 2022

Uh oh!

yongzx commented Jun 29, 2022

Uh oh!

yongzx commented Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haileyschoelkopf commented Jul 1, 2022

Uh oh!

yongzx commented Jul 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lintangsutawika commented Jul 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yongzx commented Jun 29, 2022 •

edited

Loading

yongzx commented Jul 1, 2022 •

edited

Loading