Richard Loftis


2026

Nutrition misinformation on social media often arises from selective interpretation of scientific evidence rather than outright falsehoods, making it difficult to detect. We introduce a curated, expert-annotated Instagram dataset focused on seed oils and omega-6, two domains characterized by contested dietary claims. We evaluate feature-based, embedding-based, and transformer-based models under in-domain and cross-domain settings. Results show strong in-domain performance across all models, with Sentence-BERT achieving the highest AUPRC (up to 0.96). However, performance drops substantially under cross-domain transfer, indicating limited robustness to topic shift. Analysis suggests that while contextual embeddings capture strong in-domain semantic signals, linguistically and psychologically grounded features are more stable under distribution shift. These findings highlight the value of combining semantic and interpretable linguistic signals for robust misinformation detection.