Python Wrapper for MLJAR API
This wrapper enables you to run model search and tuning with MLJAR with two lines of code! It is super easy and super powerful.
from mljar import Mljar model = Mljar(project='My awesome project', experiment='First experiment') model.fit(X,y) model.predict(X)
How to install
You can install mljar with pip:
pip install -U mljar
or from source code:
python setup.py install
How to use it
- Create an account at mljar.com and login.
- Please go to your users settings (top, right corner).
- Get your token, for example 'exampleexampleexample'.
- Set environment variable
MLJAR_TOKENwith your token value:
- That's all, you are ready to use MLJAR in your python code!
To see examples please go to mljar-examples on github.
The wrapper makes it easy to interact with MLJAR, so we try to make it intuitive to use. However, there are few arguments that can be set.
- project - string with project title
- experiment - string with experiment title
- metric - string with the metric which will be used for optimization. For classification valid metrics are:
loglossis a default for binary classification. For regression task valid metrics are:
mae, the default is
algorithms - list of selected algorithms that will be checked and tuned. For binary classification there are following algorithms:
- xgb which is for Xgboost
- lgb which is for LightGBM
- mlp which is for Deep Neural Network
- rfc which is for Random Forest
- etc which is for Extra Trees
- knnc which is for k-Nearest Neighbors
- logreg which is for Logistic Regression
The default selection for classification is
[xgb, lgb, mlp]. For regression task there are following algorithms available:
- xgbr which is for Xgboost
- lgbr which is for LightGBM
- rfr which is for Random Forest
- etr which is for Extra Trees
The default selection for regression is
validation_kfolds the number of folds to be used in validation, it is omitted if validation_train_split is not None or there is validation dataset provided. It can be number from 2 to 15.
- validation_shuffle the boolean which specify if shuffle samples before training.
It is used in
k-fold CVand in validation split. Default is set True. It is ignored when validating with separate dataset.
- validation_stratify the boolean which decides whether samples will be divided into folds with the same class distribution. In regression tasks this flag is ignored. Default is set to True.
- validation_train_split the ratio how to split training dataset into train and validation. This ratio specify what ratio from input data should be used in training. It should be from (0.05,0.95) range. If it is not None, then validation_kfolds variable is ignored.
tuning_mode string that sets how many models for each algorithm will be checked. Available modes:
- Normal - there will be checked 5-10 models
- Sport - there will be checked 10-15 models
- Insane - there will be checked 15-20 models
The default is Normal mode.
create_ensemble boolean that decides if ensemble of all available will be created. The default is
- single_algorithm_time_limit integer that sets how much time (in minutes) there will be spend for training single algorithm. The default value is 5 minutes.
- X matrix with training attributes, it can be
- y vector with target values, it can be
- validation_data tuple (X,y) with validation data. If set to None, then the k-fold CV or train split validation will be used. Default is set to None.
- wait_till_all_done boolean which decides if fit function will wait
till experiment is done, default is set to
- X matrix which will be used for computing predictions, it can be