@inproceedings{woldemariam-2020-assessing,
    title = "Assessing Users' Reputation from Syntactic and Semantic Information in Community Question Answering",
    author = "Woldemariam, Yonas",
    editor = "Calzolari, Nicoletta  and
      B{\'e}chet, Fr{\'e}d{\'e}ric  and
      Blache, Philippe  and
      Choukri, Khalid  and
      Cieri, Christopher  and
      Declerck, Thierry  and
      Goggi, Sara  and
      Isahara, Hitoshi  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Mazo, H{\'e}l{\`e}ne  and
      Moreno, Asuncion  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://preview.aclanthology.org/ingest-emnlp/2020.lrec-1.662/",
    pages = "5383--5391",
    language = "eng",
    ISBN = "979-10-95546-34-4",
    abstract = "Textual content is the most significant as well as substantially the big part of CQA (Community Question Answering) forums. Users gain reputation for contributing such content. Although linguistic quality is the very essence of textual information, that does not seem to be considered in estimating users' reputation. As existing users' reputation systems seem to solely rely on vote counting, adding that bit of linguistic information surely improves their quality. In this study, we investigate the relationship between users' reputation and linguistic features extracted from their associated answers content. And we build statistical models on a Stack Overflow dataset that learn reputation from complex syntactic and semantic structures of such content. The resulting models reveal how users' writing styles in answering questions play important roles in building reputation points. In our experiments, extracting answers from systematically selected users followed by linguistic features annotation and models building. The models are evaluated on in-domain (e.g., Server Fault, Super User) and out-domain (e.g., English, Maths) datasets. We found out that the selected linguistic features have quite significant influences over reputation scores. In the best case scenario, the selected linguistic feature set could explain 80{\%} variation in reputation scores with the prediction error of 3{\%}. The performance results obtained from the baseline models have been significantly improved by adding syntactic and punctuation marks features."
}Markdown (Informal)
[Assessing Users’ Reputation from Syntactic and Semantic Information in Community Question Answering](https://preview.aclanthology.org/ingest-emnlp/2020.lrec-1.662/) (Woldemariam, LREC 2020)
ACL