Textual records of business-oriented conversations between customers and agents need to be analyzed properly to acquire useful business insights that improve productivity.
For such an analysis, it is critical to identify appropriate textual segments and expressions to focus on, especially when the textual data consists of complete transcripts, which are often lengthy and redundant.
In this paper, we propose a method to identify important segments from the conversations by looking for changes in the accuracy of a categorizer designed to separate different business outcomes.
We extract effective expressions from the important segments to define various viewpoints.
In text mining a viewpoint defines the important associations between key entities and it is crucial that the correct viewpoints are identified.
We show the effectiveness of the method by using real datasets from a car rental service center.
1 Introduction
"Contact center" is a general term for customer service centers, help desks, and information phone lines.
Many companies operate contact centers to sell their products, handle customer issues, and address product-related and services-related issues.
In contact centers, analysts try to get insights for improving business processes from stored customer contact data.
Gigabytes of customer contact records are produced every day in the form of audio recordings of speech, transcripts, call summaries, email,
etc. Though analysis by experts results in insights that are very deep and useful, such analysis usually covers only a very small (1-2%) fraction of the total call volume and yet requires significant workload.
The demands for extracting trends and knowledge from the whole text data collection by using text mining technology, therefore, are increasing rapidly.
In order to acquire valuable knowledge through text mining, it is generally critical to identify important expressions to be monitored and compared within the textual data.
For example, given a large collection of contact records at the contact center of a manufacturer, the analysis of expressions for products and expressions for problems often leads to business value by identifying specific problems in a specific product.
If 30% of the contact records with expressions for a specific product such as "ABC" contain expressions about a specific trouble such as "cracked", while the expressions about the same trouble appear in only 5% of the contact records for similar products, then it should be a clue that the product "ABC" may actually have a crack-related problem.
An effective way to facilitate this type of analysis is to register important expressions in a lexicon such as "ABC" and "cracked" as associated respectively with their categories such as "product" and "problem" so that the behavior of terms in the same category can be compared easily.
It is actually one of the most important steps of text mining to identify such relevant expressions and their categories that can potentially lead to some valuable insights.
A failure in this step often leads to a failure in the text mining.
Also, it has been considered an artistic task that requires highly experienced consul-
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 458-467, Prague, June 2007.
©2007 Association for Computational Linguistics
tants to define such categories, which are often described as the viewpoint for doing the analysis, and their corresponding expressions through trial and error.
In this paper, we propose a method to identify important segments of textual data for analysis from full transcripts of conversations.
Compared to the written summary of a conversation, a transcription of an entire conversation tends to be quite lengthy and contains various forms of redundancy.
Many of the terms appearing in the conversation are not relevant for specific analysis.
For example, the terms for greeting such as "Hello" and "Welcome to (Company A)" are unlikely to be associated with specific business results such as purchased-or-not and satisfied-or-not, especially because the conversation is transcribed without preserving the nonverbal moods such as tone of voice, emotion etc. Thus it is crucial to identify key segments and notable expressions within conversations for analysis to acquire valuable insights.
We exploit the fact that business conversations follow set patterns such as an opening followed by a request and the confirmation of details followed by a closing, etc. By taking advantage of this feature of business conversations, we have developed a method to identify key segments and the notable expressions within conversations that tend to discriminate between the business results.
Such key segments, the trigger segments, and the notable expressions associated with certain business results lead us to easily understand appropriate viewpoints for analysis.
Application of our method for analyzing nearly one thousand conversations from a rental car reservation office enabled us to acquire novel insights for improving agent productivity and resulted in an actual increase in revenues.
Organization of the Paper: We start by describing the properties of the conversation data used in this paper.
Section 3 describes the method for identifying useful viewpoints and expressions that meet the specified purpose.
Section 4 provides the results using conversational data.
After the discussion in Section 5, we conclude the paper in Section 6.
2 Business-Oriented Conversation Data
We consider business-oriented conversation data collected at contact centers handling inbound telephone sales and reservations.
Such business oriented conversations have the following properties.
• Each conversation is a one-to-one interaction between a customer and an agent.
• For many contact center processes the conversation flow is well defined in advance.
• There are a fixed number of outcomes and each conversation has one of these outcomes.
For example, in car rentals, the following conversation flow is pre-defined for the agent.
In practice most calls to a car rental center follow this call flow.
• Opening - contains greeting, brand name, name of agent
• Pick-up and return details - agent asks location, dates and times of pick up and return, etc.
• Offering car and rate - agent offers a car specifying rate and mentions applicable special offers.
• Personal details - agent asks for customer's information such as name, address, etc.
• Confirm specifications - agent recaps reservation information such as name, location, etc.
• Mandatory enquiries - agent verifies clean driving record, valid license, etc.
• Closing - agent gives confirmation number and thanks the customer for calling.
In these conversations the participants speak in turns and the segments can be clearly identified.
Figure 1 shows part of a transcribed call.
Each call has a specific outcome.
For example, each car rental transaction has one of two call types, reservation or unbooked, as an outcome.
Because the call process is pre-defined, the conversations look similar in spite of having different results.
In such a situation, finding the differences in the conversations that have effects on the outcomes
is very important, but it is very expensive and difficult to find such unknown differences by human analysis.
We show that it is possible to define proper viewpoints and corresponding expressions leading to insights on how to change the outcomes of the calls.
AGENT: Welcome to CarCompanyA.
My name is Albert.
How may I help you?
AGENT: Allright may i know the location you want to pick the car from.
CUSTOMER: Aah ok I need it from SFO.
AGENT: For what date and time.
AGENT: Wonderful so let me see ok mam so we have a 12 or 15 passenger van avilable on this location on those dates and for that your estimated total for those three dates just 300.58$ this is with Taxes with surcharges and with free unlimited free milleage.
AGENT : alright mam let me recap the dates you want to pick it up from SFO on 3rd August and drop it off on august 6th in LA alright
CUSTOMER : and one more questions Is it just in states or could you travel out of states
AGENT : The confirmation number for your booking is 221 384.
CUSTOMER : ok ok Thank you
Agent : Thank you for calling CarCompanyA and you have a great day good bye
Figure 1: Transcript of a car rental dialog (partial)
3 Trigger Segment Detection and Effective Expression Extraction
In this section, we describe a method for automatically identifying valuable segments and concepts from the data for the user-specified difference analysis.
First, we present a model to represent the conversational data.
After that we introduce a method to detect the segments where the useful concepts for the analysis appear.
Finally, we select useful expressions in each detected trigger segment.
Each conversational data record in the collection D is defined as di.
Each di can be seen as a sequence of conversational turns in the conversational data,
and then di can be divided as
where di is the k-th turn in di and Mi is the total number of turns in di.
The + operator in the above equation can be seen as an equivalent of the string concatenation operator.
We define as the portion of di from the beginning to turn j. Using the
same notation,
collection of d~mk constitutes the Chronologically Cumulative Data up to turn mk (Dk).
Dk is represented as
Figure 2 shows an image of the data model.
We set some mk and prepare the chronologically cumulative data set as shown in Figure 3.
We represent binary mutually exclusive business outcomes such as success and failure resulting from the conversations as "A" and "not A".
di™.
Number of turns
Figure 2: Conversation data model Pi
Figure 3: Chronologically cumulative conversational data
3.2 Trigger Segment Detection
Trigger segments can be viewed as portions of the data which have important features which distinguish data of class "A" from data of class "not A".
To detect such segments, we divide each chronologically cumulative data set Dk into two data sets, training data Dka%mng and test data Dtjk:st. Starting from D1, for each Dk we trained a classifier
training
and evaluated it on D
accuracy, the fraction of correctly classified documents, as a metric of performance (Yang and Liu, 1999), we denote the evaluation result of the categorization as acc(categorizer(Dk)) for each Dk and plot it along with its turn.
Figure 4 shows the effect of gradually increasing the training data for the classification.
The distribution of expressions
acc(categorizer(Di)) rigger
in a business-oriented conversation will change almost synchronously because the call flow is predefined.
Therefore acc(categorizer(Dk)) will increase if features that contribute to the categorization appear in Dk. In contrast, acc(categorizer(Dk)) will decrease if no features that contribute to the categorization are in Dk. Therefore, from the transitions of acc(categorizer(Dk)), we can identify the segments with increases as triggers where the features that have an effect on the outcome appear.
We denote a trigger segment as seg(start position, end position).
Because the total numbers of turns can be different, we do not detect the last section as a trigger.
In Figure 4, seg(m\,m2) and seg(m4,m5) are triggers.
It is important to note that using the cumulative data is key to the detection of trigger segments.
Using non-cumulative segment data would give us the categorization accuracy for the features within that segment but would not tell us whether the features of this segment are improving the accuracy or decreasing it.
It is this gradient information between segments that is key to identifying trigger segments.
Many approaches have been proposed for docu-
ment classiication (Yang and Liu, 1999).
In this research, however, we are not interested in the clas-siication accuracy itself but in the increase and decrease of the accuracy within particular segments.
For example, the greeting, or the particular method of payment may not affect the outcome, but the mention of a speciic feature of the product may have an effect on the outcome.
Therefore in our research we are interested in identifying the particular portion of the call where this product feature is mentioned, along with its mention, which has an effect on the outcome of the call.
In our experiments we used the SVM (Support Vector Machine) classifier (Joachims, 1998), but almost any classifier should work because our approach does not depend on the classiication method.
3.3 Effective Expression Extraction
In this section, we describe our method to extract effective expressions from the detected trigger segments.
The effective expressions in Dk are those which are representative in the selected documents and appear for the irst time in the trigger segments seg(mi, mj).
Numerous methods to select features exist (Hisamitsu and Niwa, 2002) (Yang and Ped-ersen, 1997).
We use the %2 statistic for each expression in Dk as a representative metric.
For the two-by-two contingency table of a expression w and
Table 1: Contingency table for calculating the %2 statistic
# of documents including w
# of documents not including w
where N is the number of documents.
This statistic can be compared to the %2 distribution with one degree of freedom to judge representativeness.
We also want to extract the expressions that have not had an effect on the outcome before Dk. To detect the new expressions in Dk, we define the metric
where w(Dk) is the frequency of expression w in the chronologically cumulative data Dk, max(a, b) selects the larger value in the arguments, mk is the number of turns in Dk, w(DkA) is the frequency of w in Dk with the outcome ofthe corresponding data being "A", and sign( ) is the signum function.
When w in class "A" appears in Dk much more frequently than Dk-i compared with the ratio of their turns, this metric will be more than 1.
We detect signii-cant expressions by considering the combined score X2(w) • new(w).
Using this combined score, we can ilter out the representative expressions that have already appeared before Dk and distinguish signii-cant expressions that irst appear in Dk for each class
"A" and "not A".
3.4 Appropriate Viewpoint Selection
In a text mining system, to get an association that leads to a useful insight, we have to deine appropriate viewpoints.
Viewpoints refer to objects in relation to other objects.
In analysis using a conventional text mining system (Nasukawa and Nagano, 2001), the viewpoints are selected based on expressions in user dictionaries prepared by domain experts.
We have identiied important segments of the conversations by seeing changes in the accuracy ofa categorizer designed to segregate different business outcomes.
We have also been able to extract effective expressions from these trigger segments to deine various viewpoints.
Hence, viewpoint selection is now based on the trigger segments and effective expressions identiied automatically based on speci-ied business outcomes.
In the next section we apply our technique to a real life dataset and show that we can successfully select useful viewpoints.
4 Experiments and Results 4.1 Experiment Data and System
We collected 914 recorded calls from the car rental help desk and manually transcribed them.
Figure 1 shows part of a call that has been transcribed.
There are three types of calls:
Reservation Calls: Calls which got converted.
Here, "converted" means the customer made a reservation for a car.
Reserved cars can get picked-up or not picked-up, so some reserved cars do not eventually get picked-up by customers (no shows and cancellations).
Unbooked Calls: Calls which did not get converted.
Service Calls: Customers changing or enquiring about a previous booking.
The distribution of the calls is given in Table 2.
Table 2: Distribution of calls
Unbooked Calls
Reservation Calls (Picked-Up)
Reservation Calls (Not Picked-Up)
Service Calls
Total Calls
The reservation calls are most important in this context, so we focus on those 137 calls.
In the reservation calls, there are two types of outcomes, car picked-up and car not picked-up.
All reservation calls look similar in spite of having different outcomes (in terms of pick up).
The reservation happens during the call but the pick up happens at a later date.
If we can ind differences in the conversation that affect the outcome, it is expected that we could improve the agent productivity.
Reservation calls follow the pre-defined reservation call flow that we mentioned in Section 2 and it is very dificult to ind differences between them manually.
In this experiment, by using the proposed method, we try to extract trigger segments and expressions to ind viewpoints that affect the outcome ofthe reservation calls.
For the analysis, we constructed a text mining system for the difference analysis "picked-up" vs. "not picked-up".
The experimental system consists of two parts, an information extraction part and a text mining part.
In the information extraction part we deine dictionaries and templates to identify useful expressions.
In the text mining part we deine appropriate viewpoints based on the identiied expressions to get useful associations leading to useful insights.
4.2 Results of Trigger Segment Detection and Effective Expression Extraction
Torn Smj)
seg(10,15) are detected as trigger segments.
We now know that these segments are highly correlated to the outcome of the call.
For each detected trigger segment, we extract effective expressions in each class using the metric described in Section 3.3.
Table 3 shows some expressions with high values for the metric for each trigger.
In this table, "just NUMERIC dollars" is a canonical expression and an expression such as "just 160 dollars" is mapped to this canonical expression in the information extraction process.
From this result, in seg(1, 2), "make", "reservation" are correlated with "pick up" and "rate" and "check" are correlated with
Table 3: Selected expressions in trigger segments
Selected expressions
make, return, tomorrow,
assist, reservation, tonight
number, corporate program, contract, card, have, tax surcharge, just NUMERIC dollars, discount, customer club, good rate, economy
go, impala
"not-picked up".
By looking at some documents containing these expressions, we found customer intention phrases such as "would like to make a reservation", "want to check a rate", etc. Therefore, it can be induced that the way a customer starts the call may have an impact on the outcome.
From expressions in seg(10,15), it can be said that discount-related phrases and mentions of the good rates by the agent can have an effect on the outcome.
We can directly apply the conventional methods for representative feature selection to D. The following expressions were selected as the top 20 expressions from whole conversational data by using the x2 metric defined in (3).
corporate program, contract, counter, September, mile, rate, economy, last name, valid driving license,BRAND NAME, driving, telephone, midsize, tonight, use, credit, moment, airline, afternoon
From these results, we see that looking at the call as a whole does not point us to the fact that discount-related phrases, or the irst customers-utterance, affect the outcome.
Detecting trigger segments and extracting important expressions from each trigger segment are key to identifying subtle differences between very similar looking calls that have entirely opposite outcomes.
4.3 Results of Text Mining Analysis using Selected Viewpoints and Expressions
From the detected segments and expressions we determined that the customer's first utterance along with discount phrases and value selling phrases affected the call outcomes.
Under these hypotheses, we prepared the following semantic categories.
• Customer intention at start of call: From the customer's irst utterance, we extract the following intentions based on the patterns.
- strong start: would like to make a booking,
need to pick up a car, . . .
want to know the rate for vans, . . .
Under our hypotheses, the customer with a strong start has the intention of booking a car and we classify such a customer as a book-ing.customer.
The customer with a weak start usually just wants to know the rates and is classified as a rates_customer.
• discount-related phrases: discount, corporate program, motor club, buying club . . . are registered into the domain dictionary as discount-related phrases.
• value selling phrases: we extract phrases mentioning good rates and good vehicles by matching patterns related to such utterances.
- mentions of good rates: good rate, wonderful price, save money, just need to pay
this low amount, . . .
- mentions of good vehicles: good car, fantastic car, latest model, . . .
Using these three categories, we tried to ind insights to improve agent productivity.
Table 4 shows the result of two-dimensional association analysis for 137 reservation calls.
This table shows the association between customer types based on customer intention at the start of a call and pick up information.
From these results, 67%
Table 4: Association between customer types and pick up information
Customer types extracted from texts based on customer intent at start of call
(47 out of 70) of the booking_customers picked up the reserved car and only 35% (13 out of 37) of the rates_customers picked it up.
This supports our hypothesis and means that pick up is predictable from the customer's irst or second utterance.
It was found that cars booked by rates_customers tend to be "not picked up," so if we can ind any
actions by agents that convert such customers into "pick up," then the revenue will improve.
In the booking.customer case, to keep the "pick up" high, we need to determine specific agent actions that concretize the customer's intent.
Table 5 shows how mentioning discount-related phrases affects the pick up ratios for rates_customers and booking_customers.
From this table, it can
Table 5: Association between mention of discount phrases and pick up information
Rates -customer
not-picked up
Booking-customer
Pick up information
Mention of discount phrases by agents
not picked up
be seen that mentioning discount phrases affects the inal status of both types of customers.
In the rates_customer case, the probability that the booked car will be picked up, P (pick-up) is improved to 0.476 by mentioning discount phrases.
This means customers are attracted by offering discounts and this changes their intention from "just checking rate" to "make a reservation here".
We found similar trends for the association between mention of value selling phrases and pick up information.
4.4 Improving Agent Productivity
From the results of the text mining analysis experiment, we derived the following actionable insights:
• There are two types of customers in reservation calls.
- Booking.customer (with strong start) tends to pick up the reserved car.
- Rates_customer (with weak start) tends not to pick up the reserved car.
• In the rates_customer case, "pick up" is improved by mentioning discount phrases.
By implementing the actionable insights derived from the analysis in an actual car rental process, we veriied improvements in pick up.
We divided the 83 agents in the car rental reservation center into two groups.
One of them, consisting of 22 agents, was trained based on the insights from the text mining analysis.
The remaining 61 agents were not told about these indings.
By comparing these two
groups over a period of one month we hoped to see how the actionable insights contributed to improving agent performance.
As the evaluation metric, we used the pick up ratio - that is the ratio ofthe number of "pick-ups" to the number of reservations.
Following the training the pick up ratio of the trained agents increased by 4.75%.
The average pick up ratio for the remaining agents increased by 2.08%.
Before training the ratios of both groups were comparable.
The seasonal trends in this industry mean that depending on the month the bookings and pickups may go up or down.
We believe this is why the average pick up ratio for the remaining agents also increased.
Considering this, it can be estimated that by implementing the actionable insights the pick up ratio for the pilot group was improved by about 2.67%.
We conirmed that this difference is meaningful because the p-value of the t-test statistic is 0.0675 and this probability is close to the standard t-test (a=0.05).
Seeing this, the contact center trained all of its agents based on the insights from the text mining analysis.
5 Discussion
There has been a lot of work on speciic tools for analyzing the conversational data collected at contact centers.
These include call type classiication for the purpose of categorizing calls (Tang et al., 2003) (Zweig et al., 2006), call routing (Kuo and Lee, 2003) (Haffner et al., 2003), obtaining call log summaries (Douglas et al., 2005), agent assisting and monitoring (Mishne et al., 2005), and building of domain models (Roy and Subramaniam, 2006).
Filtering problematic dialogs automatically from an automatic speech recognizer has also been studied (Hastie et al., 2002) (Walker et al., 2002).
In contrast to these technologies, in this paper we consider the task of trying to ind insights from a collection of complete conversations.
In (Nasukawa and Nagano, 2001), such an analysis was attempted for agent-entered call summaries of customer contacts by extracting phrases based on domain-expert-speciied viewpoints.
In our work we have shown that even for conversational data, which is more complex, we could identify proper viewpoints and prepare expressions for each viewpoint.
Call summaries by agents tend to mask the customers' intention at the start of the call.
We get more valuable
insights from the text mining analysis of conversational data.
For such an analysis of conversational data, our proposed method has an important role.
With our method, we find the important segments in the data for doing analyses.
Also our analyses are closely linked to the desired outcomes.
In trigger detection, we created a chronologically cumulative data set based on turns.
We can also use the segment information such as the "opening" and "enquiries" described in Section 2.
We prepared data with segment information manually assigned, made the chronologically cumulative data and applied our trigger detection method.
Figure 6 shows the results of acc(categorizer(Dk)).
The trend in
details mandatory questions, closing
Conversation flow
Figure 6: Result of acc(categorizer(Dk)) using segment information
Figure 6 is similar to that in Figure 5.
From this result, it is observed that "opening" and "offering" segments are trigger segments.
Usually, segmentation is not done in advance and to assign such information automatically we need data with labeled segmentation information.
The results show that even in the absence of labeled data our trigger detection method identiies the trigger segments.
In the experiments in Section 4, we set turns for each chronologically cumulative data by taking into account the pre-defined call flow.
In Figure 5 we observe that the accuracy of the categorizer is decreasing even when using increasing parts of the call.
Even the accuracy using the complete call is less than using only the irst turn.
This indicates that the irst turn is very informative, but it also indicates that the features are not being used judiciously.
In a conventional classification task, the number offeatures are sometimes restricted
when constructing a categorizer.
It is known that selecting only significant features improves the classification accuracy (Yang and Pedersen, 1997).
We used Information Gain for selecting features from the document collection.
This method selects the most discriminative features between two classes.
As expected the classification accuracy improved significantly as we reduced the total number of features from over 2,000 to the range of 100 to 300.
Figure 7 shows the changes in accuracy.
In the pro-
posed method, we detect trigger segments using the increases and decreases of the classification accuracy.
By selecting features, the noisy features are not added in the segments.
Decreasing portions, therefore are not observed.
In this situation, as a trigger segment, we can detect the portion where the gradient of the accuracy curve increases.
Also using feature selection, we find that the classification accuracy is highest when using the entire document, which is expected.
However, we notice that the trigger segments obtained with and without feature selection are almost the same.
In the experiment, we use manually transcribed data.
As future work we would like to use the noisy output of an automatic speech recognition system to obtain viewpoints and expressions.
6 Conclusion
In this paper, we have proposed methods for identifying appropriate segments and expressions automatically from the data for user specified difference analysis.
We detected the trigger segments using the property that a business-oriented conversation fol-
lows a pre-defined flow.
After that, we identified the appropriate expressions from each trigger segment.
It was found that in a long business-priented conversation there are important segments affecting the outcomes that can not been easily detected by just looking through the conversation, but such segments can be detected by monitoring the changes of the categorization accuracy.
For the trigger segment detection, we do not use semantic segment information but only the positional segment information based on the conversational turns.
Because our method does not rely on the semantic information in the data, therefore our method can be seen as robust.
Through experiments with real conversational data, using identified segments and expressions we were able to define appropriate viewpoints and concepts leading to insights for improving the car rental business process.
Acknowledgment
The authors would like to thank Sreeram Balakr-ishnan, Raghuram Krishnapuram, Hideo Watanabe, and Koichi Takeda at IBM Research for their support.
The authors also appreciate the efforts of Jatin Joy Giri at IBM India in providing domain knowledge about the car rental process and thank him for help in constructing the dictionaries.
