ColumnReduceModel#
- class datarobotx.ColumnReduceModel(base_model, ranking_ensemble_size=5, initial_retain_ratio=0.95, initial_lives=3)[source]#
Column reduction orchestrator
Iteratively trains challenger models on increasingly column-reduced training data until diminishing returns on model performance are reached. Uses Feature Importance Rank Ensembling (FIRE) for column reduction.
Delegates training on column reduced data to the provided base model. Blenders and frozen models are excluded from champion model consideration.
- Parameters:
base_model (AutopilotModel or IntraProjectModel) – Base model to fit on column reduced training data
ranking_ensemble_size (int, default=5) – Number of top models from the leaderboard to include in the ensemble when computing the median feature importance rank for each feature
initial_retain_ratio (float, default=0.95) – Initial percent (expressed as a decimal) of cumulative feature importance to retain when performance column reduction
initial_lives (int, default=3) – Stopping criteria; number of reduction iterations to complete without establishing a new champion model
Attributes:
List of features used by the current best model
Inherited attributes:
Base model used for fitting
DataRobot python client datarobot.Model object for the present champion
DataRobot python client datarobot.Project object
Methods:
fit(*args, **kwargs)Fit column-reduced challenger models using the underlying base model
run_column_reduction(project_id[, ...])Runs feature reduction on the provided project iteratively
Inherited methods:
deploy([wait_for_autopilot, name])Deploy the model into ML Ops
Configuration parameters for the intra-project model
predict(X[, wait_for_autopilot])Make batch predictions using the present champion
predict_proba(X[, wait_for_autopilot])Calculate class probabilities using the present champion
set_params(**kwargs)Set configuration parameters for the intra-project model
share(emails)Share a project with other users.
- property base_model: ModelOperator#
Base model used for fitting
- Returns:
Base model instance
- Return type:
AutopilotModel or IntraProjectModel
- deploy(wait_for_autopilot=False, name=None)[source]#
Deploy the model into ML Ops
- Return type:
- Returns:
Deployment – Resulting ML Ops deployment
wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before deploying the model In non-notebook environments, fit() will always block until complete
name (str, optional, default=None) – Name for the deployment. If None, a name will be generated
- property dr_model: datarobot.Model#
DataRobot python client datarobot.Model object for the present champion
- Returns:
datarobot.Model object associated with this drx model
- Return type:
datarobot.Model
- property dr_project: datarobot.Project#
DataRobot python client datarobot.Project object
- Returns:
datarobot.Project object associated with this drx.Model
- Return type:
datarobot.Project
- property features#
List of features used by the current best model
- Returns:
Column names of features used in current best model
- Return type:
- fit(*args, **kwargs)[source]#
Fit column-reduced challenger models using the underlying base model
- Parameters:
args – Arguments to be passed to the base model fit()
kwargs – Keyword arguments to be passed to the base model fit()
- Return type:
- get_params()[source]#
Configuration parameters for the intra-project model
- Returns:
config – Configuration object containing the parameters for intra project model
- Return type:
Notes
Access configuration parameters for the underlying base model by calling get_params() on the base_model attribute
- predict(X, wait_for_autopilot=False)[source]#
Make batch predictions using the present champion
Predictions are calculated asynchronously - returns immediately but reinitializes the returned DataFrame with data once predictions are completed.
Predictions are made within the project containing the model using modeling workers. For real-time predictions, first deploy the model.
- Parameters:
X (pandas.DataFrame) – Dataset to be scored - target column can be included or omitted
wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before making predictions In non-notebook environments, fit() will always block until complete
- Returns:
Resulting predictions (contained in the column ‘predictions’) Returned immediately, updated automatically when results are completed.
- Return type:
FutureDataFrame
- predict_proba(X, wait_for_autopilot=False)[source]#
Calculate class probabilities using the present champion
Only available for classifier and clustering models.
- Parameters:
X (pandas.DataFrame) – Dataset to compute class probabilities on; target column can be included or omitted
wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before making predictions In non-notebook environments, fit() will always block until complete
- Returns:
Resulting predictions; probabilities for each label are contained in the column ‘class_{label}’; returned immediately, updated automatically when results are completed.
- Return type:
FutureDataFrame
See also
- classmethod run_column_reduction(project_id, ranking_ensemble_size=5, initial_retain_ratio=0.95, initial_lives=3)[source]#
Runs feature reduction on the provided project iteratively
- Parameters:
project_id (str) – project id of an existing project to fit on column reduced training data
ranking_ensemble_size (int, default=5) – Number of top models from the leaderboard to include in the ensemble when computing the median feature importance rank for each feature
initial_retain_ratio (float, default=0.95) – Initial percent (expressed as a decimal) of cumulative feature importance to retain when performance column reduction
initial_lives (int, default=3) – Stopping criteria; number of reduction iterations to complete without establishing a new champion model
- Returns:
Model object that can be used to make predictions and deploy models
- Return type:
Examples
>>> import datarobotx as drx >>> project_id = "123456" >>> colreduce_model = ColumnReduceModel.run_column_reduction(project_id)
- set_params(**kwargs)[source]#
Set configuration parameters for the intra-project model
- Parameters:
**kwargs – Configuration parameters to be set or updated for this model.
- Returns:
self – IntraProjectModel instance
- Return type:
IntraProjectModel
Notes
Configuration parameters for the underlying base model can be set by calling set_params() on the base_model attribute