Scoring Code

To get started, instantiate a ScoringCodeModel with a path to a jar file

from datarobot_predict.scoring_code import ScoringCodeModel

model = ScoringCodeModel("model.jar")

To get predictions from the model, pass a pandas DataFrame to the predict method

result_df = model.predict(df)

The Scoring Code jar file can be downloaded using the DataRobot Python Client. This example shows how to fetch Scoring Code from a deployment and use it to make predictions

# pip install datarobot

import datarobot as dr
from datarobot_predict.scoring_code import ScoringCodeModel

dr.Client(endpoint="https://app.datarobot.com/api/v2", token="<API_TOKEN>")

deployment = dr.Deployment.get(deployment_id="<DEPLOYMENT_ID>")
deployment.download_scoring_code("model.jar")

model = ScoringCodeModel("model.jar")
result_df = model.predict(df)

Feature types

The column types of the input DataFrame can affect the predicted values. A typical way to read a DataFrame from csv is to simply use

df = pd.read_csv("input.csv")

This will cause pandas to auto-detect column types from the input. If the detected column type is incompatible with the DataRobot feature type this can cause problems.

An example of this is the following boolean-like csv column:

column_name
FALSE
TRUE

FALSE

DataRobot will detect this as a categorical feature because it has missing values. Pandas will detect it as boolean and read it into the following DataFrame:

column_name
False
True
nan
False

When this column has to be converted to categorical, the values will no longer be capitalized which means they will not match the expected TRUE or FALSE values.

The solution to this is to use appropriate types when reading from csv. For the example above we could read the column as categorical

df = pd.read_csv("input.csv", dtype={"column_name": "category"})

A more general workaround is to read all or some columns as raw strings

df = pd.read_csv("input.csv", dtype=str) # use str type for all columns
df = pd.read_csv("input.csv", dtype={"column_name": "str"}) # use str type for column_name

This will force the parsing of values to be performed by logic in the Scoring Code JAR which will do it consistently with DataRobot.

Prediction Explanations

To compute Prediction Explanations, it is required that the Scoring Code model has Prediction Explanations enabled. For more info, see the DataRobot docs page about Scoring Code download.

To compute explanations, set max_explanations to a positive value

df_with_explanations = model.predict(df, max_explanations=3)

Time Series

Forecast point predictions are returned by default if no other arguments are provided for a Time Series Model. The forecast point can be specified using the forecast_point parameter or auto-detected.

result_df = model.predict(df, forecast_point=datetime.datetime(1958, 6, 1))

To do historical predictions, set time_series_type accordingly

from datarobot_predict import TimeSeriesType

result_df = model.predict(
    df,
    time_series_type=TimeSeriesType.HISTORICAL,
    predictions_start_date=datetime.datetime(2020, 1, 1),
    predictions_end_date=datetime.datetime(2022, 6, 1),
)

The date column in the input is expected to be a string in the same date format used when the model was trained.

Prediction Intervals

To compute Prediction Intervals, it is required that the Scoring Code model has Prediction Intervals enabled. For more info, see the DataRobot docs page about Scoring Code download.

Prediction intervals are computed when prediction_intervals_length is set to a positive value

result_df = model.predict(df, prediction_intervals_length=3)