Scoring Code
To get started, instantiate a ScoringCodeModel with a path to a jar file
from datarobot_predict.scoring_code import ScoringCodeModel
model = ScoringCodeModel("model.jar")
To get predictions from the model, pass a pandas DataFrame
to the predict method
result_df = model.predict(df)
The Scoring Code jar file can be downloaded using the DataRobot Python Client. This example shows how to fetch Scoring Code from a deployment and use it to make predictions
# pip install datarobot
import datarobot as dr
from datarobot_predict.scoring_code import ScoringCodeModel
dr.Client(endpoint="https://app.datarobot.com/api/v2", token="<API_TOKEN>")
deployment = dr.Deployment.get(deployment_id="<DEPLOYMENT_ID>")
deployment.download_scoring_code("model.jar")
model = ScoringCodeModel("model.jar")
result_df = model.predict(df)
Feature types
The column types of the input DataFrame can affect the predicted values. A typical way to read a DataFrame from csv is to simply use
df = pd.read_csv("input.csv")
This will cause pandas to auto-detect column types from the input. If the detected column type is incompatible with the DataRobot feature type this can cause problems.
An example of this is the following boolean-like csv column:
column_name
FALSE
TRUE
FALSE
DataRobot will detect this as a categorical feature because it has missing values. Pandas will detect it as boolean and read it into the following DataFrame:
column_name
False
True
nan
False
When this column has to be converted to categorical, the values will no longer be
capitalized which means they will not match the expected TRUE
or FALSE
values.
The solution to this is to use appropriate types when reading from csv. For the example above we could read the column as categorical
df = pd.read_csv("input.csv", dtype={"column_name": "category"})
A more general workaround is to read all or some columns as raw strings
df = pd.read_csv("input.csv", dtype=str) # use str type for all columns
df = pd.read_csv("input.csv", dtype={"column_name": "str"}) # use str type for column_name
This will force the parsing of values to be performed by logic in the Scoring Code JAR which will do it consistently with DataRobot.
Prediction Explanations
To compute Prediction Explanations, it is required that the Scoring Code model has Prediction Explanations enabled. For more info, see the DataRobot docs page about Scoring Code download.
To compute explanations, set max_explanations
to a positive value
df_with_explanations = model.predict(df, max_explanations=3)
Time Series
Forecast point predictions are returned by default if no other arguments are provided for a Time Series Model.
The forecast point can be specified using the forecast_point
parameter or auto-detected.
result_df = model.predict(df, forecast_point=datetime.datetime(1958, 6, 1))
To do historical predictions, set time_series_type
accordingly
from datarobot_predict import TimeSeriesType
result_df = model.predict(
df,
time_series_type=TimeSeriesType.HISTORICAL,
predictions_start_date=datetime.datetime(2020, 1, 1),
predictions_end_date=datetime.datetime(2022, 6, 1),
)
The date column in the input is expected to be a string in the same date format used when the model was trained.
Prediction Intervals
To compute Prediction Intervals, it is required that the Scoring Code model has Prediction Intervals enabled. For more info, see the DataRobot docs page about Scoring Code download.
Prediction intervals are computed when prediction_intervals_length
is set to a positive value
result_df = model.predict(df, prediction_intervals_length=3)