Personalized next-best action recommendation with multi-party interaction learning for automated decision-making

doi:10.1371/journal.pone.0263010

Fig 1.

Next-best action-based personalized decision-making in constrained, tailored, sequential and interactive dynamic processes with state-action-response-coupled sequences.

A decision-maker interacts with a client sequentially at t time steps. At each time point i, the client is associated with his demographics d_i and state s_i. The decision-maker takes an action a_i on the client’s state s_i under policy constraint p_i. The client responds to the action with his behavior o_i, and the undertaken action takes effect with reward r_i showing the transform effect from the current state to the next state. Note, the diagram only illustrates an ideal scenario: one client who interacts with one decision-maker, and one action corresponds to one response at each time point. From bottom to top: a rectangle represents a client’s demographics, a light-blue ellipse represents a client’s response, a circle represents a client’s states, a light-red ellipse represents decision actions, and a rounded rectangle represents policies constraining decision actions.

More »

Expand

Fig 2.

The framework for modeling the next-best action-oriented personalized decision-making.

C_t refers to the representation describing client c at time t, s_t is the vector of the client’s state representation, a_t refers to an action selected from the candidate action set , a_t refers to the vector representation of action a_t, and is the set of recommended next-best actions. The recommender first embeds a client’s demographics, behaviors and current state to a state vector s_t by the personalized representation module (Fig 3), then feeds s_t and a_t into the reward prediction module to evaluate the effectiveness of the action. The actions in the candidate set with the top-k highest rewards are then recommended as the next-best actions.

More »

Expand

Fig 3.

A reinforced coupled recurrent network to learn personalized client representation.

Given a client c at current time t with the description C_t, O_t,i refers to the client response at past time i, a_i is the decision action assigned to the client, o_i represents the vector representation of O_t,i, a_i represents the vector representation of a_i, represents the hidden state corresponding to the action, represents the hidden state corresponding to the client response, d is the transformed vector corresponding to the client’s relatively stable personal information D_t, s_imp indicates the learned data-driven implicit features, s_exp refers to the transformed domain-driven explicit features, s_t is the resultant state vector representation for the client c, and FC refers to fully connected networks.

More »

Expand

Fig 4.

A coupled recurrent unit (CRU) for modeling state-action-response-coupled long-term dependencies.

and refer to the representation vectors of the historical sequences of actions and client responses, respectively. r_o and r_a are two gates to control the impact of historical responses and actions on their current states. Gates z_o and z_a control the impact of current response and action states on updating the memory of their historical information respectively. r_i is an interaction gate to capture the dependence between a decision action and a client response.

More »

Expand

Fig 5.

An example of representing three clients by the reinforced coupled recurrent network.

Three debtors with different demographics and past response behaviors to the same decision actions are represented in three vectors.

More »

Expand

Fig 6.

Reward prediction for the next-best action on a client’s state.

The reward (rating) of an action is predicted by residual networks corresponding to a client’s state.

More »

Expand

Table 1.

The distribution of rewards to 10 actions specified by debt collection experts.

More »

Expand

Table 2.

Average reward lift for 10 actions recommended by 11 deep models over the review measured by domain-driven debt collection rules.

More »

Expand

Table 3.

Precision lift of 10 actions recommended by 11 deep models over that by domain-driven debt collection rules.

More »

Expand

Table 4.

The reward mean squared error (MSE) per action between the reward made by the domain-driven debt collection rules and that recommended by 10 deep models.

More »

Expand

Fig 7.

CRN convergence w.r.t. loss value on the validation debt collection data.

The X-axis refers to the number of epochs, and the Y-axis refers to the loss value of the CRN objective function (Eq (3)).

More »

Expand