Samuel Kriman
2025
SWAN: An Efficient and Scalable Approach for Long-Context Language Modeling
Krishna C Puvvada
|
Faisal Ladhak
|
Santiago Akle Serano
|
Cheng-Ping Hsieh
|
Shantanu Acharya
|
Somshubra Majumdar
|
Fei Jia
|
Samuel Kriman
|
Simeng Sun
|
Dima Rekesh
|
Boris Ginsburg
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
We present SWAN, a causal Transformer architecture in the decoder-only style that generalizes robustly to sequence lengths substantially longer than those seen during training. SWAN interleaves layers without positional encodings (NoPE) and sliding-window attention layers equipped with rotary positional encodings (SWA-RoPE), and applies a dynamic scaling mechanism for attention scores during inference. Experiments demonstrate that SWAN achieves strong length extrapolation without requiring additional long-context training. In addition, SWAN is more computationally efficient than the standard Transformer architecture, resulting in lower training cost and higher inference throughput. We further demonstrate that existing pre-trained decoder-only models can be adapted to the SWAN architecture with minimal continued training, enabling extended contexts. Overall, our work presents an effective approach for scaling language models to longer contexts in a robust and efficient manner.
2021
Joint Detection and Coreference Resolution of Entities and Events with Document-level Context Aggregation
Samuel Kriman
|
Heng Ji
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop
Constructing knowledge graphs from unstructured text is an important task that is relevant to many domains. Most previous work focuses on extracting information from sentences or paragraphs, due to the difficulty of analyzing longer contexts. In this paper we propose a new jointly trained model that can be used for various information extraction tasks at the document level. The tasks performed by this system are entity and event identification, typing, and coreference resolution. In order to improve entity and event typing, we utilize context-aware representations aggregated from the detected mentions of the corresponding entities and events across the entire document. By extending our system to document-level, we can improve our results by incorporating cross-sentence dependencies and additional contextual information that might not be available at the sentence level, which allows for more globally optimized predictions. We evaluate our system on documents from the ACE05-E+ dataset and find significant improvement over the sentence-level SOTA on entity and event trigger identification and classification.
Search
Fix author
Co-authors
- Shantanu Acharya 1
- Santiago Akle Serano 1
- Boris Ginsburg 1
- Cheng-Ping Hsieh 1
- Heng Ji 1
- show all...