Budget Allocation: morphl AI's marketing science solution

1 company introduction

Morphl is a foreign company providing AI solutions (PS: this company has a nice web UI ~):
website: https://morphl.io/products/morphl-cloud.html

MorphL Community Edition
MorphL Community Edition uses big data and machine learning to predict user behavior in digital products and services. Its goal is to improve KPI (click through rate, conversion rate, etc.) through personalization. The main models include:

  • Model 1: crowd shopping stage - circle selection of high potential buyers;
    Pinpoint users who are more likely to join the shopping cart, check out or complete the transaction.
  • Model 2: shopping loss model cart environment - circle selection of people who are easy to lose;
    Pinpoint users who are more likely to give up their shopping cart in the current or next round.
  • Model 3: Customers LTV - lifecycle model
    Reduce customer churn and turn them into loyal customers by focusing on users with low or medium customer lifetime value.
  • Personalized recommendation model
  • Associated product model
  • High frequency purchase model
  • Search intention
  • Population classification
  • Loss warning

2 budget allocation

In the morphl theoretical system, budget allocation includes two steps:

  • Calculate the functional relationship between budget - > revenue
  • Calculate the budget allocation optimization model for each activity

Step 1 budget / revenue forecast function

f(Cost) = f(Cost(t) | Cost(t-1), Revenue(t-1), ... Cost(t0), Revenue(t0)) = Revenue function

Forecast based on historical budget / revenue data

Step 2 budget optimization
Once you have the budget / revenue forecast function for each activity, you can start to solve the budget optimization. There are three cases:

The yellow line is the cumulative line of budget / input amount;
The blue line is the relationship between the budget and the returning sum

The vertex of the curve is the best buffer range, which can help with budge t allocation

3. Interpretation of relevant cases

3.1 relevant data style

github address: Morphl-AI/Ecommerce-Marketing-Spend-Optimization

Look at the two data source formats released by github:

  • Market spending data, including year, total investment, TV/Digital and other channel revenue
  • Channel conversion data, advertising ID, FB activity ID, age, gender, exposure, click, cost, conversion, etc

Several cases introduce their common methods:

3.2 2. Budget optimization - basic statistical model

Here are actually several very simple methods

  • Income ~ input, ROI calculated by direct division
  • Income exposure and exposure input are also direct division conversion

3.3 4. Budget allocation - pseudo-revenue - first-revenue assumption - regressions

  • The regression model is used to calculate Revenue~cost
  • Two methods are illustrated, Revenue ~ cost two variable regression; Covariates such as rev ~ cost + click
    Here is a bucket index concept, which is not particularly understood. It is speculated that it is a reasonable activity interval, similar to session

Let a bucket be: C o s t B = [ 0 , 0 , 50 , 20 , 0 , 15 ] Cost_B=[0, 0, 50, 20, 0, 15] CostB​=[0,0,50,20,0,15], R e v e n u e B = [ 30 , 100 ] Revenue_B=[30, 100] RevenueB​=[30,100].
This means that the first revenue (30) was generated by the first two costs alone,
so we merged the next bucket as well.
We'll sum them, getting C Σ B = 85 C_{\Sigma B}=85 CΣB​=85 and R Σ B = 130 R_{\Sigma B}=130 RΣB​=130. Then, the bucket constant is: α B = 130 / 85 = 1.529 \alpha_B=130/85=1.529 αB​=130/85=1.529.
Then, our pseudo-revenues will be: P s e u d o − R e v e n u e B = [ 0 ∗ α B , 0 ∗ α B , 50 ∗ α B , 20 ∗ α B , 0 ∗ α B , 15 ∗ α B ] = [ 0 , 0 , 76.45 , 30.58 , 0 , 22.935 ] Pseudo-Revenue_{B} = [0*\alpha_B, 0*\alpha_B, 50*\alpha_B, 20*\alpha_B, 0*\alpha_B, 15*\alpha_B] = [0, 0, 76.45, 30.58, 0, 22.935] Pseudo−RevenueB​=[0∗αB​,0∗αB​,50∗αB​,20∗αB​,0∗αB​,15∗αB​]=[0,0,76.45,30.58,0,22.935].

With the help of the above examples, I guess,

  • Why not one-to-one correspondence: [ 0 , 0 , 50 , 20 , 0 , 15 ] − > [ r 1 , r 2 , r 3 , r 4 , r 5 ] [0,0,50,20,0,15] -> [r1,r2,r3,r4,r5] [0,0,50,20,0,15]−>[r1,r2,r3,r4,r5]
    Because the input and statistical income are not synchronized, it will take some time to count after the input.
  • How to correspond one by one?
    Some data interpolation strategies can be adopted, such as calculating a total bucket constant

3.4 5. Budget allocation - pseudo-revenue - one-week assumption - regressions

The fourth case may be intermittent activities, and the fifth case may be a long-term case,
Therefore, the bucket interval here is a fixed one week, which is used for calculation.

4 code test

github address: Morphl-AI/Ecommerce-Marketing-Spend-Optimization

Look at the two data source formats released by github:

  • Market spending data, including year, total investment, TV/Digital and other channel revenue
  • Channel conversion data, advertising ID, FB activity ID, age, gender, exposure, click, cost, conversion, etc

4.1 simple coefficient first-order revenue forecast

Corresponding to jupyter - 2 Budget optimization - basic statistical model

Direct = > R e v / C o s t Rev / Cost Rev/Cost

import pandas as pd

'''
Model 1: directly calculate the total ROI
Directly modeling f(Cost) = Revenue
'''
class StatisticalModel:
    def __init__(self):
        # This model has just a single parameter, computed as the count between targets and inputs
        self.param = np.nan
        
    def fit(self, x, t):
        assert self.param != self.param
        self.param = t.sum() / x.sum()  # Core, a very simple ROI is calculated as a coefficient
    
    def predict(self, x):
        assert self.param == self.param
        return x * self.param
    
def errorL1(y, t):
    return np.abs(y - t).mean()

def plot(model, valData, xKey, tKey):
    validCampaigns = list(valData.keys())
    ax = plt.subplots(len(validCampaigns), figsize=(5, 30))[1]
    for i, k in enumerate(validCampaigns):
        x = valData[k][xKey]
        t = valData[k][tKey]
        y = model[k].predict(x)
        ax[i].scatter(x, y, label="%s Predicted" % (tKey))
        ax[i].scatter(x, t)
        ax[i].set_title(k)
        ax[i].legend()

# Data read in

conversion_data = pd.read_csv('Datasets/conversion_data.csv')
# marketing_spend_data = pd.read_csv('Datasets/marketing_spend_data.csv')


model_cost_revenue = {}
predictions_cost_revenue = {}
errors_cost_revenue = {}
displayDf = pd.DataFrame()
res_cost_revenue = []

campaigns = set(conversion_data['xyz_campaign_id'])
# from sklearn.model_selection import train_test_split
# X_train,X_test,y_train,y_test = train_test_split(iris.data,iris.target,test_size=0.3,random_state=0)
trainData = {}
valData = {}
for k in campaigns:
    data = conversion_data[conversion_data['xyz_campaign_id'] == k]
    num = int(len(data)*0.8)
    trainData[k] = data[:num]
    valData[k] = data[num:]

# Cost_col = 'Cost'
# Revenue_col = 'Revenue'
Cost_col = 'Spent'  # investment
Revenue_col = 'Total_Conversion' # produce

for k in campaigns:
    model_cost_revenue[k] = StatisticalModel()
    model_cost_revenue[k].fit(trainData[k][Cost_col], trainData[k][Revenue_col])
    predictions_cost_revenue[k] = model_cost_revenue[k].predict(valData[k][Cost_col])
    errors_cost_revenue[k] = errorL1(predictions_cost_revenue[k], valData[k][Revenue_col])
    res_cost_revenue.append([k, trainData[k][Cost_col].sum(), trainData[k][Revenue_col].sum(), \
                model_cost_revenue[k].param, errors_cost_revenue[k]])


displayDf = pd.DataFrame(res_cost_revenue, columns=["Campaign", Cost_col, Revenue_col, "Fit", "Error (L1)"])
display(displayDf)
print("Mean error:", displayDf["Error (L1)"].mean())

plot(model_cost_revenue, valData, Cost_col, Revenue_col)

Just an example,

4.2 model II: consider exposure

Similar: cost - > exposure - > revenue

Cost x Revenue ~= Cost x Sessions + Sessions x Revenue

exposure = a1 * cost
 income = a2 * exposure

It is divided into two steps, and the main interception is also 2 Budget optimization - basic statistical model

# Set a session randomly
session_col = 'Impressions' # exposure
Cost_col = 'Spent'  # investment
Revenue_col = 'Total_Conversion' # produce

# Step 1: exposure = a1 * cost
model_cost_sessions = {}
predictions_cost_sessions = {}
errors_cost_sessions = {}
displayDf = pd.DataFrame()
res_cost_sessions = []
for k in campaigns:
    model_cost_sessions[k] = StatisticalModel()
    model_cost_sessions[k].fit(trainData[k][Cost_col], trainData[k][session_col])
    predictions_cost_sessions[k] = model_cost_sessions[k].predict(valData[k][Cost_col])
    errors_cost_sessions[k] = errorL1(predictions_cost_sessions[k], valData[k][session_col])
    res_cost_sessions.append([k, trainData[k][Cost_col].sum(), trainData[k][session_col].sum(), \
                model_cost_sessions[k].param, errors_cost_sessions[k]])

displayDf = pd.DataFrame(res_cost_sessions, columns=["Campaign", Cost_col, session_col, "Fit", "Error (L1)"])
display(displayDf)
print("Mean error:", displayDf["Error (L1)"].mean())

plot(model_cost_sessions, valData, Cost_col, session_col)

# Step 2: revenue = a2 * exposure
model_sessions_revenue = {}
predictions_sessions_revenue = {}
errors_sessions_revenue = {}
displayDf = pd.DataFrame()
res_sessions_revenue = []
for k in campaigns:
    model_sessions_revenue[k] = StatisticalModel()
    model_sessions_revenue[k].fit(trainData[k][session_col], trainData[k][Revenue_col])
    predictions_sessions_revenue[k] = model_sessions_revenue[k].predict(valData[k][session_col])
    errors_sessions_revenue[k] = errorL1(predictions_sessions_revenue[k], valData[k][Revenue_col])
    res_sessions_revenue.append([k, trainData[k][session_col].sum(), trainData[k][Revenue_col].sum(), \
                model_sessions_revenue[k].param, errors_sessions_revenue[k]])

displayDf = pd.DataFrame(res_sessions_revenue, columns=["Campaign", session_col, Revenue_col, "Fit", "Error (L1)"])
display(displayDf)
print("Mean error:", displayDf["Error (L1)"].mean())

plot(model_sessions_revenue, valData, session_col, Revenue_col)

# Step 3: Merge
displayDf = pd.DataFrame()
errors_cost_revenue = {}
res_cost_revenue_combined = []

class TwoModel(object):
    def __init__(self, modelA, modelB):
        self.modelA = modelA
        self.modelB = modelB
    
    def predict(self, x):
        return self.modelA.predict(self.modelB.predict(x))
models_cost_revenue = {k : TwoModel(model_cost_sessions[k], model_sessions_revenue[k]) for k in valData}

for k in campaigns:
    predictions_cost_revenue[k] = models_cost_revenue[k].predict(valData[k][Cost_col])
    errors_cost_revenue[k] = errorL1(predictions_cost_revenue[k], valData[k][Revenue_col])
    res_cost_revenue_combined.append([k, errors_cost_revenue[k]])

displayDf = pd.DataFrame(res_cost_revenue_combined, columns=["Campaign", "Error (L1)"])
display(displayDf)
print("Mean error:", displayDf["Error (L1)"].mean())


plot(models_cost_revenue, valData, Cost_col, Revenue_col)

Added by skdzines on Thu, 03 Feb 2022 02:10:23 +0200