FeatureDiscoveryModel#

class datarobotx.FeatureDiscoveryModel(base_model, remove_udfs=False)[source]#

Feature discovery orchestrator

Autopilot orchestration is delegated to the provided base model.

Builds features on secondary datasets before running an autopilot model. Primary and secondary datasets can be provided as pandas dataframes or AI catalog entries. Users can also provide a relationship configuration id built using the Python SDK.

Parameters:
  • base_model (AutopilotModel or IntraProjectModel) – Base model for orchestrating Autopilot after feature discovery. Clustering and AutoTS are not supported.

  • remove_udfs (bool) – Whether feature discovery should forego deriving features using UDFs

Examples

>>> import datarobotx as drx
>>> df_target = pd.read_csv(
"https://s3.amazonaws.com/datarobot_public_datasets/drx/Lending+Club+Target.csv"
)
>>> df_transactions = pd.read_csv(
"https://s3.amazonaws.com/datarobot_public_datasets/drx/Lending+Club+Transactions.csv"
)
>>> base_model = drx.AutoMLModel()
>>> model = drx.FeatureDiscoveryModel(base_model)
>>> transaction_relationship = drx.Relationship(
    df_transactions,
    keys="CustomerID",
    temporal_key="Date"
    feature_windows=[(-14, -7, "DAY"), (-7, 0, "DAY")],
    dataset_name="transactions"
)
>>> model.fit(
    df_target,
    target="BadLoan",
    feature_engineering_prediction_point="Date",
    relationships_configuration=[transaction_relationship]
    )

Inherited attributes:

base_model

Base model used for fitting

dr_model

DataRobot python client datarobot.Model object for the present champion

dr_project

DataRobot python client datarobot.Project object

Methods:

fit(X, relationships_configuration, *args[, ...])

Fit a feature discovery model

get_derived_features()

Feature discovery derived features

get_derived_sql()

SQL recipes for producing derived features

Inherited methods:

deploy([wait_for_autopilot, name])

Deploy the model into ML Ops

get_params()

Configuration parameters for the intra-project model

predict(X[, wait_for_autopilot])

Make batch predictions using the present champion

predict_proba(X[, wait_for_autopilot])

Calculate class probabilities using the present champion

set_params(**kwargs)

Set configuration parameters for the intra-project model

share(emails)

Share a project with other users.

property base_model: ModelOperator#

Base model used for fitting

Returns:

Base model instance

Return type:

AutopilotModel or IntraProjectModel

deploy(wait_for_autopilot=False, name=None)[source]#

Deploy the model into ML Ops

Return type:

Deployment

Returns:

  • Deployment – Resulting ML Ops deployment

  • wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before deploying the model In non-notebook environments, fit() will always block until complete

  • name (str, optional, default=None) – Name for the deployment. If None, a name will be generated

property dr_model: datarobot.Model#

DataRobot python client datarobot.Model object for the present champion

Returns:

datarobot.Model object associated with this drx model

Return type:

datarobot.Model

property dr_project: datarobot.Project#

DataRobot python client datarobot.Project object

Returns:

datarobot.Project object associated with this drx.Model

Return type:

datarobot.Project

fit(X, relationships_configuration, *args, target=None, feature_engineering_prediction_point=None, **kwargs)[source]#

Fit a feature discovery model

Applies automatic feature engineering and feature selection to the dataset before running the base model. Note that AutoTS and Clustering base models are not supported for feature discovery.

Parameters:
  • X (pandas.DataFrame or str) – Training dataset for challenger models

  • relationships_configuration (Union[str, Relationship, List[Relationship]]) – Secondary dataset(s) relationship configuration. For more complex relationships, users can instead pass the relationship configuration id of a relationship configured using the official DR python client

  • *args – Positional arguments to be passed to the base model fit()

  • target (str, optional) – Column name from the dataset to be used as the target variable

  • feature_engineering_prediction_point (str, optional) – Column name of feature in target dataset to join based on time Must be set in order to derive time based features

  • **kwargs – Keyword arguments to be passed to the base model fit()

Return type:

FeatureDiscoveryModel

get_derived_features()[source]#

Feature discovery derived features

Returns:

df – DataFrame containing the derived features from the feature discovery process.

Return type:

FutureDataFrame

get_derived_sql()[source]#

SQL recipes for producing derived features

Returns:

String with the SQL code for generating the derived features from the feature discovery process. Use with print() for a more readable output

Return type:

str

get_params()[source]#

Configuration parameters for the intra-project model

Returns:

config – Configuration object containing the parameters for intra project model

Return type:

dict

Notes

Access configuration parameters for the underlying base model by calling get_params() on the base_model attribute

predict(X, wait_for_autopilot=False)[source]#

Make batch predictions using the present champion

Predictions are calculated asynchronously - returns immediately but reinitializes the returned DataFrame with data once predictions are completed.

Predictions are made within the project containing the model using modeling workers. For real-time predictions, first deploy the model.

Parameters:
  • X (pandas.DataFrame) – Dataset to be scored - target column can be included or omitted

  • wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before making predictions In non-notebook environments, fit() will always block until complete

Returns:

Resulting predictions (contained in the column ‘predictions’) Returned immediately, updated automatically when results are completed.

Return type:

FutureDataFrame

predict_proba(X, wait_for_autopilot=False)[source]#

Calculate class probabilities using the present champion

Only available for classifier and clustering models.

Parameters:
  • X (pandas.DataFrame) – Dataset to compute class probabilities on; target column can be included or omitted

  • wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before making predictions In non-notebook environments, fit() will always block until complete

Returns:

Resulting predictions; probabilities for each label are contained in the column ‘class_{label}’; returned immediately, updated automatically when results are completed.

Return type:

FutureDataFrame

See also

predict

set_params(**kwargs)[source]#

Set configuration parameters for the intra-project model

Parameters:

**kwargs – Configuration parameters to be set or updated for this model.

Returns:

self – IntraProjectModel instance

Return type:

IntraProjectModel

Notes

Configuration parameters for the underlying base model can be set by calling set_params() on the base_model attribute

share(emails)[source]#

Share a project with other users. Sets the user role as an owner of the project.

Parameters:

emails (Union[str, list]) – A list of email addresses of users to share with