# What's Mine becomes Yours: Detecting Context-Dependent Paraphrases in News Interview Dialogs

You probably are either interested in the **annotation data** or using a **computational model to predict paraphrases in dialog**.



## Annotation Data

To use the annotation data you can immediately load the tsv file with probably all information you want, see `ANON_ALL-TOKEN-annotations.tsv`. If you want to use the data split, load`ANON_TRAIN-TOKEN-annotations.tsv`, `ANON_DEV-TOKEN-annotations.tsv` and `ANON_TEST-TOKEN-annotations.tsv`  in the folder `result/Annotations/Paraphrase Annotations`. Datasets are of the below form. 

"Guest HLs" indicates the proportion of annotators that highlighted the nth word (same order as "Guest Tokens"), only considering those who classified the pair as a paraphrase. 
"Vote" includes the number of annotators classifying the pair as a paraphrase out of the total number of annotators. For example [10, 20] means 10 out of 20 annotators classified the pair as a paraphrase.

| QID          | Guest Tokens                                                 | Guest HLs                         | Host Tokens                                                  | Host HLs                                              | Vote     |
| ------------ | ------------------------------------------------------------ | --------------------------------- | ------------------------------------------------------------ | ----------------------------------------------------- | -------- |
| CNN-177596-7 | ['This', 'is', 'not', 'good.']                               | [1.0, 0.9, 0.9, 0.9]              | ['This', 'is', 'what', 'you', "don't", 'want', 'happening', 'with', 'your', 'menorah,',... | [0.9, 0.8, 0.9, 0.9, 0.9, 0.9, 0.9, 0.7, 0.7, 0.7,... | [10, 20] |
| NPR-8678-6   | ['Well,', 'earlier', 'this', 'month,', "Guatemala's", 'highest', 'court', 'had', 'blocked', 'the',... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | ['Did', 'President', "Trump's", 'threat', 'to', 'the', 'Guatemalan', 'leadership', 'and', 'government',... | [0.2, 0.4, 0.4, 0.4, 0.4, 0.4, 0.6, 0.6, 0.4, 0.4,... | [5, 20]  |

or you can use the code:

```
from paraphrase.annotation_data import get_human_anns

guest_tokens_per_qid, host_tokens_per_qid, human_anns_per_qid, human_class_per_qid = (
        get_human_anns())  # question_ids="DEV"/"TEST"/"TRAIN" for the splits
```



## Predicting Paraphrases

### Token Classifier

Use our trained token classifier on the huggingface hub: WILL BE RELEASED AFTER BLIND REVIEW

### In-Context Learning

We used the following prompt on our dataset with the most success. This is very much specific to the interview setting. We have not yet tried variations for other types of dialog data.

```
A Paraphrase is a rewording or repetition of content in the guest's statement. It rephrases what the guest said.

Given an interview on - with the summary: Fresh Prince Star Alfonso Ribeiro Sues Over Dance Moves; Rapper 2 Milly Alleges His Dance Moves were Copied.
Guest and Host say the following:
Guest (TERRENCE FERGUSON, RAPPER): I guess it was season 5 when they premiered it in the game. A bunch of DMs, a bunch of Twitter requests, e-mails, everything was like, you, your game is in the dance, you need to sue, "Fortnite" stole it. Even like big artists, major artists like Joe Buttons and stuff, they have their own like show, daily struggle, they say, you, you must sue "Fortnite", and I'm like, "Fortnite", what is that? I don't even know what it is --
Host (QUEST): So you weren't even familiar?
In the reply, does the host paraphrase something specific the guest says?

Explanation: Let's think step by step.
Terrence Ferguson says at the end of his turn that he didn't know Fortnite.
Quest, the host of the interview, repeats that the guest doesn't know Fortnite.
So they both say that the guest didn't know Fortnite. Therefore, the answer is yes, the host is paraphrasing the guest.
Verbatim Quote Guest: "I'm like, "Fortnite", what is that?  I don't even know what it is"
Verbatim Quote Host: "you weren't even familiar?"
Classification: Yes.


Given an interview on 2013-10-1 with the summary: Interview With Idaho Congressman Raul Labrador
Guest and Host say the following:
Guest (REP. RAUL LABRADOR (R), IDAHO): That's what we have been asking the president. We would like the senators to actually come and negotiate with us. So I think that would be a terrific idea.
Host (BLITZER): You say you want to negotiate, but what about the debt ceiling? Are you ready to see that go up without any strings attached, as the president demands?
In the reply, does the host paraphrase something specific the guest says?

Explanation: Let's think step by step.
Republican Raul Labrador says that "we" want to negotiate with the senators. By that he probably means the Republicans.
Blitzer, the host of the interview, says that "you" want to negotiate. Blitzer probably means the Republicans as well.
So both of them are saying that the Republicans want to negotiate. Therefore, the answer is yes, host is paraphrasing the guest.
Verbatim Quote Guest: "We would like the senators to actually come and negotiate with us."
Verbatim Quote Host: "you want to negotiate"
Classification: Yes.


Given an interview on 2015-12-15 with the summary: Interview with Kentucky Senator Rand Paul
Guest and Host say the following:
Guest (SEN. RAND PAUL (R-KY), PRESIDENTIAL CANDIDATE): If you're not in our country, there are no constitutional protections for you.
Host (TAPPER): So, you don't have a problem with Facebook giving the government access to the private accounts of people applying to enter the U.S.?
In the reply, does the host paraphrase something specific the guest says?

Explanation: Let's think step by step.
Rand Paul says that there are no constitutional protections for people outside "our" country. Since Rand Paul was a Kentucky Senator in 2015, he is referring to the United States.
The host of the interview, Tapper, asks if Paul has no problem with Facebook giving the U.S. government access to accounts of people who apply to enter the United States.
 While Tapper and Paul both talk about people outside the U.S., Paul talks about constitutional protections in the U.S. and Tapper infers Paul's opinion on a company giving the government access to private accounts of their users.
Tapper, the host, adds a conclusion to what the guest said ("so you don't have a problem with ...") without rewording or repeating the content of the guest's utterance. Therefore, the answer is no, the host does not reword or repeat the guest.
Verbatim Quote Guest: None.
Verbatim Quote Host: None.
Classification: No.


Given an interview on 2005-7-20 with the summary: CIA Operative Talks About Life After Cover Blown
Guest and Host say the following:
Guest (MASSIMO CALABRESI, "TIME" MAGAZINE): She was very pleasant. Talked about family life. They chatted about errands they need to run and things like that.
Host (PHILLIPS): Well, she talked a lot about her family and her kids. And you get a personal sense for how they're living day by day.
In the reply, does the host paraphrase something specific the guest says?

Explanation: Let's think step by step.
Massimo Calabresi talks about a conversation with a person he refers to as "she". How "she" seemed and what the conversation was about: Family life and errands.
Phillips, the interview host, also talks about that same conversation with "she". That "she" talked about family and daily life.
 Therefore, the answer is yes, the host is paraphrasing the guest.
Verbatim Quote Guest: "She" "Talked about family life." "errands they need to run and things like that."
Verbatim Quote Host: "she talked" "about her family and her kids." "how they're living day by day."
Classification: Yes.


Given an interview on 2018-05-29 with the summary: Two weeks after the wave of protests and deadly clashes at the Israeli border, many Gazans are wounded and feeling like the demonstrations didn't bring any tangible benefits.
Guest and Host say the following:
Guest (DANIEL ESTRIN, BYLINE): And so that's the main question I've been asking people here, is, was the price worth it?
Host (STEVE INSKEEP, HOST): You're telling me people on the ground don't see it that way.
In the reply, does the host paraphrase something specific the guest says?

Explanation: Let's think step by step.
Daniel Estrin says that he asked people if the price was worth it.
 Steve Inskeep, the host of the interview, asks if people "don't see it that way," asking what people responded to Estrin's question to people.
However, the host does not paraphrase the question itself. Therefore, the answer is no, the host does not reword or repeat the guest.
Verbatim Quote Guest: None.
Verbatim Quote Host: None.
Classification: No.


Given an interview on 2005-7-28 with the summary: Pregnant Philadelphia Woman Still Missing
Guest and Host say the following:
Guest (TONY HANSON, KYW NEWSRADIO): Police have indicated that they have been getting cooperation from the people involved, of course, they are looking at all of her personal relationships to see if there were any problems there.
Host (PHILLIPS): I know you've talked to various members of her family. What did they tell you?
In the reply, does the host paraphrase something specific the guest says?

Explanation: Let's think step by step.
KYW Newsradio's Tony Hanson that the police are investigating "her" personal relationships.
Phillips says KYW Newsradio's Hanson has spoken to various members of "her" family.
Hanson, the guest, talks about the police conducting interviews, while, the host, Phillips, talks about the reporter Hanson conducting interviews. Therefore, the answer is no, the host is not repeating the actual content of the guest's statement. The host rather continues on with a related topic.
Verbatim Quote Guest: None.
Verbatim Quote Host: None.
Classification: No.


Given an interview on 2005-5-26 with the summary: Lionel Tate is back in jail for allegedly holding up a pizza deliveryman. The 18-year-old Florida youth was the youngest person to be sent to prison for life in U.S. history. At age 12, he was accused of murdering his 6-year-old neighbor and friend Tiffany Eunick when he claimed he was demonstrating wrestling moves. Ed Gordon talks with Sgt. DeLacy Davis from New Jersey, a mentor and one of several supporters that put together a "re-entry into society plan" for Tate.
Guest and Host say the following:
Guest (DE LACY DAVIS): I think that, God willing, and then certainly if we were given another shot at this apple, I think the entire group would be amenable to shipping him here to me, which is what we felt would be a better environment to give him a new start. People, places and things needed to be changed, and consistently changed, and the plan adjusted based upon how he was faring.
Host (ED GORDON, host): So that would be, actually, coming to New Jersey and being under the auspices, frankly, of De Lacy Davis.
In the reply, does the host paraphrase something specific the guest says?

Explanation: Let's think step by step.
The guest, De Lacy Davis, talks about a plan to give Lionel Tate a fresh start by sending him to De Lacy Davis ("shipping him here to me").  From the summary, we know that De Lacy Davis is probably in New Jersey.
Ed Gordon, the interview host, clarifies this and says the plan is for Tate to come to New Jersey and  be mentored by De Lacy Davis.
Both are talking about Tate being sent to New Jersey, to De Lacy Davis. Therefore, the answer is yes, the host paraphrases the guest.
Verbatim Quote Guest: "shipping him here to me"
Verbatim Quote Host: "coming to New Jersey and being under the auspices" "of De Lacy Davis."
Classification: Yes.


Given an interview on  2000-8-3 with the summary: Kissinger: Bush is Fully Qualified for Foreign Policy Decisions
Guest and Host say the following:
Guest (HENRY KISSINGER, FMR. SECRETARY OF STATE): No, I haven't talked to him but I've talked to his secretary and we've passed messages back and forth between the family and me. And they tell me he's improved a lot. And I'm to see him tomorrow morning.
Host (NATALIE ALLEN, CNN ANCHOR): I'm sure that will cheer him up, to have a visit from you, and the doctors did say, just a short while ago, he's expected make a full recovery.
In the reply, does the host paraphrase something specific the guest says?

Explanation: Let's think step by step.
Former Secretary of State Henry Kissinger talks about someone ("him") who is sick and possibly hospitalized. Kissinger says that he will see "him" tomorrow morning.
Natalie Allen, the CNN anchor, says it will cheer "him" up when Kissinger visits.
Both the interview guest Kissinger and the interview host Allen mention that Kissinger will visit "him". Therefore, the answer is yes, the host is paraphrasing the guest.
Verbatim Quote Guest: "I'm to see him."
Verbatim Quote Host: "him" "have a visit from you"
Classification: Yes.


Given an interview on DATE with the summary: SUMMARY
Guest and Host say the following:
Guest (GUEST-NAME): GUEST-UTTERANCE
Host (HOST-NAME): HOST-UTTERANCE
In the reply, does the host paraphrase something specific the guest says?

Explanation: Let's think step by step.
```



## Requirements

Installed with Python 3.11.7

see `requirements.txt` 