About#
Project goals#
drx intends to explore and prototype a programmatic DataRobot experience that is:
Declarative and simple by default
Streamlines common workflows
Uses broadly familiar syntax and verbiage where possible
Unobtrusively customizable
Allows default behaviors and configuration to be easily overriden…
…but, not at the expense of complicating the common experience
Experimental
Accelerates user experimentation and discovery of productionizable artifacts
Offers new and yet-to-be-proven abstractions and concepts
Is not to be productionized itself
Configuration of abstractions#
DataRobot provides dozens of settings and configuration options that govern
execution behavior. drx aims to provide a streamlined default experience without
compromising on the ability to customize, configure or override.
To this end, drx abstractions are typically structured in the following manner:
The most important configuration parameters (typically limited to ~7) are exposed and documented in the abstraction’s constructor as keyword arguments.
Wherever possible these keyword arguments are optional.
Base models for core problem types can be configured using the
DRConfigconfiguration object class. This class:Enables easy, inline discovery and specification of the many DataRobot parameters using autocomplete, docstrings, and nesting of parameters by category
Preserves an equally interchangeable flat dictionary representation.
Maintains the same names and descriptions used in the DataRobot REST API for all parameters that get passed to DR.
See the reference documentation for examples of usage.
When a parameter name is known in advance, it can be passed directly to the abstraction’s constructor (e.g.
AutoMLModel()) or after construction using theset_params()method:As additional optional keyword arguments:
AutoMLModel(project_description='foo')orset_params(project_description='foo')Within a dictionary to be unpacked:
AutoMLModel(**my_dict_of_params)orset_params(**my_dict_of_params)
Examples#
Configuration objects can be interchanged with dictionaries, trading-off ease of discovery (e.g. autocomplete + documentation capabilities) for succinctness.
# -------------------------------------------------------------------
# Configuration discovered through sequential, shift-tab autocomplete
# -------------------------------------------------------------------
config_1 = DRConfig() # or construct by calling get_params() on an abstraction
config_1.Modeling.AutoML.blend_best_models = False
# ----------------------------------------------------------
# Configuration discovered by reading the HTML documentation
# ----------------------------------------------------------
config_2 = drx.ModelingAutoMLConfig(blend_best_models=False)
# -----------------------------------------------------------
# Direct configuration (e.g. parameter name known in advance)
# -----------------------------------------------------------
config_3 = {'blend_best_models': False}
# ----------
# Equivalent
# ----------
model_1 = AutoMLModel(**config_1)
model_2 = AutoMLModel(**config_2)
model_3 = AutoMLModel(**config_3)
model_4 = AutoMLModel(blend_best_models=False)
In general, parameters have unique names which allows for use of the flat
dictionary representation seen with config_3 and config_4. If parameter
names become ambiguous in the future, configurations using duplicatively
named parameters will likely need to use the nested representations.
Async execution#
drx leverages the python standard asyncio and
concurrent.futures libraries to initiate and monitor
execution of long-running tasks concurrently when running in an interactive notebook.
This is done to retain interactivity during time consuming computations, allowing users
to explore other ideas or hypotheses from the same notebook while waiting for a job to finish.
In a notebook, methods that return a data such as predict(), predict_proba()
return a drx.FutureDataFrame object immediately without blocking the notebook.
The notebook will only block if access on the underlying attributes
or data is attempted.
In scripts, the default behavior is serial execution. Each command will run to completion before the next command is executed.
Notebook |
Script |
|
|---|---|---|
Concurrent execution |
Default behavior |
Not available |
Serial execution |
Upon request |
Default behavior |
Waiting for fit() to complete in a notebook before predicting or deploying#
model.fit(df, target='your_target_col')
# This predict() will make predictions as soon as a trained model is available
model.predict(score)
# This predict() call blocks the notebook until autopilot has completed
model.predict(score, wait_for_autopilot=True)
# When executed from a stand-alone script, both will block until fit() is complete
FAQ#
How is drx different from the existing python API experience?#
The existing python client is extremely flexible, powerful, and configurable. However, certain common workflows may require multiple intermediate steps and the learning curve can be steep for new users.
drx aims to provide a streamlined experience for the most common workflows,
but also offer new, experimental high-level abstractions.
Will drx be incorporated into the product?#
Certain prototypes and concepts from drx may eventually be included in the product.
Is drx the same as the “Idiomatic Python SDK” project?#
drx is focused on exploring, prototyping and validating longer-term experimental
concepts and provides no guarantees of backwards-compatibility, feature completeness,
or ongoing support.
What’s on the roadmap for development?#
The following features are either planned or being considered for future releases:
Building blocks for easily rolling your own novel mini-apps and visualizations
Model factory and experimentation helpers
Saving/restoring model state/initialization parameters (from a project id)
General support for feature discovery projects, including streamlined configuration
Deployment predictions using AI catalog datasets (and batch prediction API)