TextPredictor by Qiaochu-Song · Pull Request #486 · microsoft/FLAML

Qiaochu-Song · 2022-03-17T18:29:25Z

Add an estimator for TextPredictor.
Add a test for TextPredictor estimator.

liususan091219

Why is the training data passed through kwargs? It’s supposed to be passed from X_train

liususan091219

Rename your estimator to MultiModalEstimator

liususan091219 · 2022-03-21T23:52:21Z

flaml/ml.py

    ARIMA,
    SARIMAX,
    TransformersEstimator,
+    AGTextPredictorEstimator,


Suggested change

AGTextPredictorEstimator,

MultiModalEstimator,

Please update all occurrences

Please update the commit.

liususan091219 · 2022-03-21T23:56:43Z

flaml/model.py

+        from autogluon.text import TextPredictor
+
+        super().__init__(task, **params)
+        self.estimator_class = TextPredictor


I can remove this and initialize the model with TextPredictor instead. Is that better?

liususan091219 · 2022-03-21T23:58:04Z

flaml/model.py

+        }
+        return search_space_dict
+
+    def _init_fix_args(self, automl_fit_kwargs: dict=None):


Why do we need this function? Can we simply remove it?

If we have AGArgs dataclass in utils, and just use the default settings, we can remove this function, and just have self.ag_args=AGArgs() in MultimodalEstimator.fit(). Does it make sense?

Yes, you can implement this, and define a similar init_hf_args if you need to check user input validity.

liususan091219 · 2022-03-22T00:01:19Z

test/nlp/test_agtextpredictor.py

+    score = automl.model.estimator.evaluate(test_dataset)
+    print(f"Inference on test set complete, {metric}: {score}")
+    del automl
+    gc.collect()


add a breakline to the end

liususan091219 · 2022-03-22T00:01:41Z

test/nlp/test_agtextpredictor.py

+        "gpu_per_trial": 0,
+        "max_iter": 2,
+        "time_budget": 50,
+        "task": "mm_multi",


rename mm_multi -> multimodal-classification

flaml/model.py

liususan091219 · 2022-03-22T00:30:26Z

flaml/model.py

+        # train_data = self._kwargs["train_data"]
+        import pandas as pd
+        train_data = pd.concat([X_train, y_train], axis=1)
+        tuning_data = pd.concat([X_train, y_train], axis=1)


You mean X_val, y_val?

I will remove this line since the tuning data is not necessary anymore.

liususan091219 · 2022-03-22T00:35:44Z

flaml/model.py

+
+        self.fix_args = fix_args
+
+    def _init_hp_config(self, text_backbone: str, multimodal_fusion_strategy: str):


Please define cfg by defining a function inside of flaml/nlp/utils.py:class AGArgs, the remove this function.

This _init_hp_config is to use the AGArgs and the self.params to get the hyperparametersdiction for the TextPredictor. If removed, still need to assemble this diction inside the MultimodalEstimator.fit(). Do you think it is better without this function and have this part inside the .fit()?

Move this function to a function inside of AGArgs because AGArgs is for managing the config for AG.

liususan091219 · 2022-03-22T00:37:39Z

flaml/data.py

 )
 SEQREGRESSION = "seq-regression"
-REGRESSION = ("regression", SEQREGRESSION)
+REGRESSION = ("regression", "mm_regression", SEQREGRESSION)


Rename "mm_regression" -> "multimodal-regression", define a static variable for it

liususan091219 · 2022-03-22T00:39:01Z

flaml/data.py

    SEQCLASSIFICATION,
    MULTICHOICECLASSIFICATION,
    TOKENCLASSIFICATION,
+    "mm_multi",


can you automatically detect "mm_multi" and "mm_binary" so we don't need these two values anymore?

liususan091219 · 2022-03-22T16:46:05Z

flaml/model.py

+
+        # train_data = self._kwargs["train_data"]
+        import pandas as pd
+        train_data = pd.concat([X_train, y_train], axis=1)


please use estimator._join method. See TransformersEstimator._join

liususan091219 · 2022-03-22T19:56:52Z

flaml/ml.py

    ARIMA,
    SARIMAX,
    TransformersEstimator,
+    AGTextPredictorEstimator,


Please update the commit.

liususan091219 · 2022-03-22T19:57:10Z

flaml/model.py

+        from autogluon.text import TextPredictor
+
+        super().__init__(task, **params)
+        self.estimator_class = TextPredictor


liususan091219 · 2022-03-22T19:58:08Z

flaml/model.py

+        }
+        return search_space_dict
+
+    def _init_fix_args(self, automl_fit_kwargs: dict=None):


Yes, you can implement this, and define a similar init_hf_args if you need to check user input validity.

liususan091219 · 2022-03-22T20:00:43Z

flaml/model.py

+
+        self.fix_args = fix_args
+
+    def _init_hp_config(self, text_backbone: str, multimodal_fusion_strategy: str):


Move this function to a function inside of AGArgs because AGArgs is for managing the config for AG.

liususan091219 · 2022-03-22T20:08:13Z

flaml/model.py

+        save_dir = self.fix_args["output_dir"]
+        label_column = self.fix_args["label_column"]
+        dataset_name = self.fix_args["dataset_name"]
+        ag_model_save_dir = os.path.join(save_dir, f"{dataset_name}_ag_text_multimodal_{text_backbone}\


ok. Can you use the original directory save_dir instead of the modified directory ag_model_save_dir so users know where to find the saved model?

bug fix

…new-test2

Varia and others added 19 commits March 15, 2022 15:02

Change readme to trigger test

0a0a4c6

add dependencies for AG

002683f

add user permission to test_notebook_example L81

60a847c

add mlflow dependency to setup

60a9e27

add textpredictor estimator and test

bc7f38d

new estimator, no test file

f9ca56b

Update automl.py

fe0ecbb

Update automl.py

4a52ac7

add test with gc, narrow down mxnet version

30cc834

Merge branch 'test_main' of github.com:Qiaochu-Song/FLAML into test_main

14e6720

skip test for py3.6 and win+py3.8, loose mxnet ver

6b75a73

no ag on windows, remove mlflow dependency

d10945e

no ag on windows, remove mlflow dependency

06f64b2

test with direct return

c9ff3d4

debug without new test

e7b6f6d

w/o os.environ setting in new test, direct return

2307b37

debug, import only in new test

bf3203b

move new test to automl

10c93b2

move new test to test/nlp/

53b5f09

liususan091219 suggested changes Mar 21, 2022

View reviewed changes

pass data with X_train

ee3cacb

liususan091219 suggested changes Mar 22, 2022

View reviewed changes

Varia and others added 5 commits March 24, 2022 16:13

pr fixes, debugging

8096a89

update with upstream

fed989b

Rename to MultimodalEstimator, pr fix

c40af7d

remove comment

d0b3b11

Update data.py

30e9f60

bug fix

Varia and others added 27 commits April 14, 2022 14:14

adjust testing data and raise budget

4fa136d

Merge remote-tracking branch 'upstream/main' into new-test2

c5d9914

shrink test toy data and budget

25c1baf

change to regression test

f9d3b22

add metric to kwargs for mm in train_estimator, raise test budget

c1568b4

use valid data if any for early stopping, raise test budget

1e4201d

return to the original budget

9692d4e

fix valid DF checking

1b2cb28

simplify isinstance in ml.py

05941bc

Merge remote-tracking branch 'upstream/main' into new-test2

984d000

reduce text column and budget

74f27b5

use only 4-row toy test data

c8848c7

test 10s budget

7be2c5c

minimize test toy dataset

1c7f7ad

shorter test sentence

be60fa6

give enough test budget

3a29c5b

give enough test budget

543b660

solve conflict

4296129

Merge branch 'mxtextpredictor' of github.com:Qiaochu-Song/FLAML into …

ca30eab

…new-test2

add pytorch backend support

5bd061f

set pytorch backend to default

2b150e7

pytorch backend support only

505c894

solve merge conflict

cd98daf

test remove os and python ver constraints

98ee138

no support for python 3.6

ff8c078

no support for python 3.6 or windows

24a5333

Merge branch 'main' into mxtextpredictor

2aeb563

Qiaochu-Song changed the title ~~MxTextPredictor~~ TextPredictor May 20, 2022

qingyun-wu assigned sonichi Nov 1, 2022

thinkall added the wontfix This will not be worked on label Jan 20, 2026


		self.fix_args = fix_args

		def _init_hp_config(self, text_backbone: str, multimodal_fusion_strategy: str):

Conversation

Qiaochu-Song commented Mar 17, 2022

Uh oh!

liususan091219 left a comment

Choose a reason for hiding this comment

Uh oh!

liususan091219 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments