About#

Project goals#

drx intends to explore and prototype a programmatic DataRobot experience that is:

  1. Declarative and simple by default

    • Streamlines common workflows

    • Uses broadly familiar syntax and verbiage where possible

  2. Unobtrusively customizable

    • Allows default behaviors and configuration to be easily overriden…

    • …but, not at the expense of complicating the common experience

  3. Experimental

    • Accelerates user experimentation and discovery of productionizable artifacts

    • Offers new and yet-to-be-proven abstractions and concepts

    • Is not to be productionized itself

Configuration of abstractions#

DataRobot provides dozens of settings and configuration options that govern execution behavior. drx aims to provide a streamlined default experience without compromising on the ability to customize, configure or override.

To this end, drx abstractions are typically structured in the following manner:

  1. The most important configuration parameters (typically limited to ~7) are exposed and documented in the abstraction’s constructor as keyword arguments.

  2. Wherever possible these keyword arguments are optional.

  3. Base models for core problem types can be configured using the DRConfig configuration object class. This class:

    • Enables easy, inline discovery and specification of the many DataRobot parameters using autocomplete, docstrings, and nesting of parameters by category

    • Preserves an equally interchangeable flat dictionary representation.

    • Maintains the same names and descriptions used in the DataRobot REST API for all parameters that get passed to DR.

    See the reference documentation for examples of usage.

  4. When a parameter name is known in advance, it can be passed directly to the abstraction’s constructor (e.g. AutoMLModel()) or after construction using the set_params() method:

    • As additional optional keyword arguments: AutoMLModel(project_description='foo') or set_params(project_description='foo')

    • Within a dictionary to be unpacked: AutoMLModel(**my_dict_of_params) or set_params(**my_dict_of_params)

Examples#

Configuration objects can be interchanged with dictionaries, trading-off ease of discovery (e.g. autocomplete + documentation capabilities) for succinctness.

# -------------------------------------------------------------------
# Configuration discovered through sequential, shift-tab autocomplete
# -------------------------------------------------------------------
config_1 = DRConfig() # or construct by calling get_params() on an abstraction
config_1.Modeling.AutoML.blend_best_models = False

# ----------------------------------------------------------
# Configuration discovered by reading the HTML documentation
# ----------------------------------------------------------
config_2 = drx.ModelingAutoMLConfig(blend_best_models=False)

# -----------------------------------------------------------
# Direct configuration (e.g. parameter name known in advance)
# -----------------------------------------------------------
config_3 = {'blend_best_models': False}

# ----------
# Equivalent
# ----------
model_1 = AutoMLModel(**config_1)
model_2 = AutoMLModel(**config_2)
model_3 = AutoMLModel(**config_3)
model_4 = AutoMLModel(blend_best_models=False)

In general, parameters have unique names which allows for use of the flat dictionary representation seen with config_3 and config_4. If parameter names become ambiguous in the future, configurations using duplicatively named parameters will likely need to use the nested representations.

Async execution#

drx leverages the python standard asyncio and concurrent.futures libraries to initiate and monitor execution of long-running tasks concurrently when running in an interactive notebook. This is done to retain interactivity during time consuming computations, allowing users to explore other ideas or hypotheses from the same notebook while waiting for a job to finish.

In a notebook, methods that return a data such as predict(), predict_proba() return a drx.FutureDataFrame object immediately without blocking the notebook. The notebook will only block if access on the underlying attributes or data is attempted.

In scripts, the default behavior is serial execution. Each command will run to completion before the next command is executed.

Notebook

Script

Concurrent execution

Default behavior

Not available

Serial execution

Upon request

Default behavior

Waiting for fit() to complete in a notebook before predicting or deploying#


model.fit(df, target='your_target_col')

# This predict() will make predictions as soon as a trained model is available
model.predict(score)

# This predict() call blocks the notebook until autopilot has completed
model.predict(score, wait_for_autopilot=True)

# When executed from a stand-alone script, both will block until fit() is complete

FAQ#

How is drx different from the existing python API experience?#

The existing python client is extremely flexible, powerful, and configurable. However, certain common workflows may require multiple intermediate steps and the learning curve can be steep for new users.

drx aims to provide a streamlined experience for the most common workflows, but also offer new, experimental high-level abstractions.

Will drx be incorporated into the product?#

Certain prototypes and concepts from drx may eventually be included in the product.

Is drx the same as the “Idiomatic Python SDK” project?#

drx is focused on exploring, prototyping and validating longer-term experimental concepts and provides no guarantees of backwards-compatibility, feature completeness, or ongoing support.

Can I share drx with customers?#

Not at this time.

How do I contribute to drx or share feedback?#

Reach out to Marcus Braun or Marshall Krassenstein for access to the private repository.

What’s on the roadmap for development?#

The following features are either planned or being considered for future releases:

  • Building blocks for easily rolling your own novel mini-apps and visualizations

  • Model factory and experimentation helpers

  • Saving/restoring model state/initialization parameters (from a project id)

  • General support for feature discovery projects, including streamlined configuration

  • Deployment predictions using AI catalog datasets (and batch prediction API)