Jan Šnajder
I am an Associate Professor at the Faculty of Electrical Engineering
and Computing (FER) at the University of Zagreb and a member of Text
Analysis and Knowledge Engineering Lab (TakeLab). My research interests
are in natural language processing (NLP) and machine
learning. My current focus is on lexical semantics, information extraction, and opinion mining. I am a fan of
functional programming, Haskell in particular.
Short Bio
I received my MSc and PhD degrees in Computer Science
from the University of Zagreb, Faculty of Electrical Engineering and
Computing (UNIZG FER), Zagreb, Croatia in 2006 and 2010, respectively.
From 2002 I was working as a research assistant and from
2016 I am working as an Associate Professor at UNIZG FER.
In 2012 and 2013 I was a visiting researcher at the Department of
Computational Linguistics at
Heidelberg University. In 2015 I was a visiting researcher at the NICT in Kyoto, and in 2014 and 2015 a visiting researcher at the IMS, Stuttgart University. In 2016 I was a visiting researcher at the Department of Computing and Information Systems, University of Melbourne.
Curriculum vitae
Teaching
Publications
2019
-
Jan Šnajder, Tamara Sladoljev-Agejev, Svjetlana Koliić-Vehovec (2019).
Analysing Rhetorical Structure as a Key Feature of Summary Coherence.
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2019), Florence, 46–51.
[paper]
-
Antonio Šajatović, Maja Buljan, Jan Šnajder, Bojana Dalbelo Bašić (2019).
Evaluating Automatic Term Extraction Methods on Individual Documents.
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), Florence, 149–154.
[paper]
-
Mladen Karan, Jan Šnajder (2019).
Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level Context.
Proceedings of the Third Workshop on Abusive Language Online (ALW3), Florence,129–134.
[paper]
-
Niko Palić, Juraj Vladika, Dominik Čubelić, Ivan Lovrenčić, Maja Buljan, Jan Šnajder (2019).
TakeLab at SemEval-2019 Task 4: Hyperpartisan News Detection.
Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019), Minneapolis, 995–998.
[paper]
2018
-
Ivan Sekulić, Matej Gjurković, Jan Šnajder (2018).
Not Just Depressed: Bipolar Disorder Prediction on Reddit.
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2018), Brussels, 72–78.
[paper]
-
Mladen Karan, Jan Šnajder (2018).
Cross-domain detection of abusive language online.
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), Brussels, 132–137.
[paper]
-
Martin Tutek, Jan Šnajder (2018).
Iterative Recursive Attention Model for Interpretable Sequence Classification.
Proceedings of Analyzing and interpreting neural networks for NLP (BlackBoxNLP 2018), Brussels, 249-257.
[paper]
-
Viktor Golem, Mladen Karan, Jan Šnajder (2018).
Combining Shallow and Deep Learning for Aggressive Text Detection.
Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), Santa Fe, 188–198.
[paper]
-
Damir Korenčić, Strahil Ristov, Jan Šnajder (2018).
Document-Based Topic Coherence Measures for News Media Text.
Expert Systems with Applications, 114: 357–373.
[paper]
-
Maja Buljan, Sebastian Padó, Jan Šnajder (2018).
Lexical Substitution for Evaluating Compositional Distributional Models.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologie (NAACL 2018), 206–211.
[paper]
[data]
-
Matej Gjurković Jan Šnajder (2018).
Reddit: A Gold Mine for Personality Prediction.
Proceedings of the Second Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES 2018), New Orleans, 87–97.
[paper]
-
Martin Gluhak, Maria Pia di Buono, Abbas Akkasi, Jan Šnajder (2018).
TakeLab at SemEval-2018 Task 7: Combining Sparse and Dense Features for Relation Classification in Scientific Texts
Proceedings of The 12th International Workshop on Semantic Evaluation (SemEval-2018), 842–847.
[paper]
-
Ana Brassard, Tin Kuculo, Filip Boltužić and Jan Šnajder (2018).
TakeLab at SemEval-2018 Task12: Argument Reasoning Comprehension with Skip-Thought Vectors.
Proceedings of The 12th International Workshop on Semantic Evaluation (SemEval-2018), 842–847.
[paper]
-
Domagoj Alagić Jan Šnajder, Sebastian Padó (2018).
Leveraging Lexical Substitutes for Unsupervised Word Sense Induction.
Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, 5004–5011.
[paper]
-
Mladen Karan, Jan Šnajder (2018).
Paraphrase-Focused Learning to Rank for Domain-Specific Frequently Asked Questions Retrieval.
Expert Systems with Applications, 91: 418–433.
[paper]
2017
-
Julian Brooke, Jan Šnajder, Timothy Baldwin (2017).
Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice.
Transactions of the Association for Computational Linguistics, 5: 455–470.
[paper]
-
Maria Pia di Buono, Jan Šnajder (2017).
Linguistic Features and Newsworthiness: An Analysis of News Style.
Proceedings of Fourth Italian Conference on Computational Linguistics (Clic-It 2017), Rome.
[paper]
-
Tamara Sladoljev-Agejev, Jan Šnajder (2017).
Using Analytic Scoring Rubrics in the Automatic Assessment of College-Level Summary Writing Tasks in L2.
Proceedings of the Eighth International Joint Conference on Natural Language Processing (IJCNLP 2017), Taipei, 181–186.
[paper]
-
Maria Pia di Buono, Jan Šnajder, Bojana Dalbelo Bašić Goran Glavaš, Martin Tutek, Nataša Milic-Frayling (2017).
Predicting News Values from Headline Text and Emotions.
Proceedings of the 2017 EMNLP Workshop on Natural Language Processing Meets Journalism, Copenhagen, 1–6.
[paper]
-
Filip Boltužić and Jan Šnajder (2017).
Toward Stance Classification Based on Claim Microstructures.
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2017), Copenhagen, 74–80.
[paper]
-
Zoran Medić, Jan Šnajder, Sebastian Padó (2017).
Does Free Word Order Hurt? Assessing the Practical Lexical Function Model for Croatian.
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), Vancouver, 115–120.
[paper]
[data]
-
Marin Kukovačec, Juraj Malenica, Ivan Mršić, Antonio Šajatović, Domagoj Alagić, Jan Šnajder
Takelab at SemEval-2017 Task 6: #RankingHumorIn4Pages.
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, 396–400.
[paper]
-
Leon Rotim, Martin Tutek, Jan Šnajder (2017).
TakeLab at SemEval-2017 Task 5: Linear Aggregation of Word Embeddings for Fine-Grained Sentiment Analysis of Financial News.
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, 866–871.
[paper]
-
David Lozić, Doria Šarić, Ivan Tokić, Zoran Medić, Jan Šnajder (2017).
TakeLab at SemEval-2017 Task 4: Recent Deaths and the Power of Nostalgia in Sentiment Analysis in Twitter.
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, 784–789.
[paper]
-
Filip Šaina, Toni Kukurin, Lukrecija Puljić, Mladen Karan, Jan Šnajder (2017).
TakeLab-QA at SemEval-2017 Task 3: Classification Experiments for Answer Retrieval in Community QA.
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, 339–343.
[paper]
-
Mladen Karan, Jan Šnajder (2017).
Detecting Non-covered Questions in Frequently Asked Questions Collections.
International Conference on Applications of Natural Language to Information Systems (NLDB 2017), Ličge, 387–390.
[paper]
-
Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber (2017).
The First Cross-Lingual Challenge on Recognition, Normalization and Matching of Named Entities in Slavic Languages.
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2017), Valencia, 76–85.
[paper]
-
Domagoj Alagić Jan Šnajder (2017).
A Preliminary Study of Croatian Lexical Substitution.
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2017), Valencia, 14–19.
[paper]
-
Leon Rotim, Jan Šnajder (2017).
Comparison of Short-Text Sentiment Analysis Methods for Croatian.
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2017), Valencia, 69–75.
[paper]
-
Paula Gombar, Zoran Medić, Domagoj Alagić Jan Šnajder (2017).
Debunking Sentiment Lexicons: A Case of Domain-Specific Sentiment Classification for Croatian.
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2017), Valencia, 54–59.
[paper]
-
Maja Buljan, Jan Šnajder (2017).
Combining Linguistic Features for the Detection of Croatian Multiword Expressions.
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, 194–199.
[paper]
[data]
-
Maria Pia di Buono, Martin Tutek, Jan Šnajder, Goran Glavaš, Bojana Dalbelo Bašić, Nataša Milic-Frayling (2017).
Two Layers of Annotation for Representing Event Mentions in News Stories.
Proceedings of the 11th Linguistic Annotation Workshop (LAW 2017), Valencia, 82–90.
[paper]
2016
-
Krešimir, Baksa, Dino, Dolović, Goran, Glavaš, Jan Šnajder (2016).
Tagging Named Entities in Croatian Tweets.
Slovenščina 2.0, 4(1): 20–41.
[paper]
-
Jan Šnajder (2016).
Social Media Argumentation Mining: The Quest for Deliberateness in Raucousness.
Paper presented at Dagstuhl Seminar 16161 - Natural Language Argumentation: Mining, Processing, and Reasoning over Textual Arguments, Dagstuhl.
[paper]
-
Sebastian Padó, Aurelie Herbelot, Max Kisselew, Jan Šnajder (2016).
Predictability of Distributional Semantics in Derivational Word Formation.
Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016), Osaka, 1285–1296.
[paper]
[data]
-
Sebastian Padó, Jan Šnajder, Jason Utt, Britta Zeller (2016).
Smoothing Syntax-Based Semantic Spaces: Let The Winner Take It All.
Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), Bochum, 186–191.
[paper]
-
Federico Cerutti, Alexis Palmer, Ariel Rosenfeld, Jan Šnajder, Francesca Toni (2016).
A Pilot Study in Using Argumentation Frameworks for Online Debates.
The First International Workshop on Systems and Algorithms for Formal Argumentation (SAFA 2016), Postdam, 63–74.
[paper]
-
Martin Tutek, Goran Glavaš, Jan Šnajder, Nataša Milic-Frayling, Bojana Dalbelo Bašić (2016).
Detecting and Ranking Conceptual Links between Texts.
Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 2016), Indianapolis, 2077–2080.
[paper]
-
Filip Boltužić and Jan Šnajder (2016).
Fill the Gap! Analyzing Implicit Premises between Claims from Online Debates.
Proceedings of the 3rd Workshop on Argumentation Mining (ArgMining 2016), ACL 2016, Berlin, 124–133.
[paper]
[data]
-
Damir Korenčić Marijana Grbešića-Zenzerović, Jan Šnajder (2016).
Topics and their Salience in the 2015 Parliamentary Election in Croatia: A Topic Model based Analysis of the Media Agenda.
Proceedings of the International Conference on the Advances in Computational Analysis of Political Text (PolText 2016), Dubrovnik, 48–54.
[paper]
- Mladen Karan, Jan Šnajder, Daniela Širinić, Goran Glavaš (2016).
Analysis of Policy Agendas: Lessons Learned from Automatic Topic Classification of Croatian Political Texts.
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2016), ACL 2016, Berlin, 12–21.
[paper]
-
Martin Tutek, Ivan Sekulić, Paula Gombar, Ivan Paljak, Filip Čulinović, Filip Boltužić, Mladen Karan, Domagoj Alagić and Jan Šnajder (2016).
TakeLab at SemEval-2016 Task 6: Stance Classification in Tweets Using a Genetic Algorithm Based Ensemble.
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), NAACL 2016, San Diego, 476–480.
[paper]
-
Mladen Karan, Jan Šnajder (2016).
FAQIR: A Frequently Asked Questions Retrieval Test Collection.
Proceedings of the 19th International Conference on Text, Speech and Dialogue (TSD 2016), Brno, 74–81.
[paper]
[data]
-
Domagoj Alagić and Jan Šnajder (2016).
Cro36WSD: A Lexical Sample for Croatian Word Sense Disambiguation.
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, 1689–1694.
[paper]
[data]
-
Marko Bekavac and Jan Šnajder (2016).
Graph-Based Induction of Word Senses in Croatian.
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, 3014–3018.
[paper]
[data]
-
Ivan Sekulić and Jan Šnajder (2016).
VerbCROcean: A Repository of Fine-Grained Semantic Verb Relations for Croatian.
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, 2676–2681.
[paper]
[data]
2015
- Jan Šnajder and Petra Almić (2015).
Modeling Semantic Compositionality of Croatian Multiword Expressions.
Informatica, 39(3): 301–309.
[paper]
[data]
-
Damir Korenčić, Strahil Ristov, Jan Šnajder (2015).
Getting the Agenda Right: Measuring Media Agenda using Topic Models.
Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications (TM'15), Melbourne, 61–66.
[paper]
[data]
-
Domagoj Alagić and Jan Šnajder (2015).
Experiments on Active Learning for Croatian Word Sense Disambiguation.
Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2015), Hissar, 49–58.
[paper]
[slides]
[data]
-
Goran Glavaš and Jan Šnajder (2015).
Resolving Entity Coreference in Croatian with a Constrained Mention-Pair Model.
Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2015), Hissar, 17–23.
[paper]
[slides]
[data]
-
Mladen Karan and Jan Šnajder (2015).
Evaluation of Manual Query Expansion Rules on a Domain Specific FAQ Collection.
Proceedings of the 6th Conference of the CLEF Association (CLEF'15), Toulouse, 248–253.
[paper]
-
Filip Boltužić and Jan Šnajder (2015).
Identifying Prominent Arguments in Online Debates Using Semantic Textual Similarity.
Proceedings of the 2nd Workshop on Argumentation Mining (ArgMining 2015), NAACL 2015, Denver, 110–115.
[paper]
[slides]
-
Sebastian Padó, Alexis Palmer, Max Kisselew, Jan Šnajder (2015).
Measuring Semantic Content To Assess Asymmetry in Derivation.
Proceedings of the IWCS 2015 Workshop on Advances in Distributional Semantics, London.
[paper]
-
Sebastian Padó, Britta Zeller, Jan Šnajder (2015).
Morphological Priming in German: The Word is Not Enough (Or Is It?).
Proceedings of NetWordS 2015, Pisa, 42–45.
[paper]
-
Max Kisselew, Sebastian Padó, Alexis Palmer, Jan Šnajder (2015).
Obtaining a Better Understanding of Distributional Models of German Derivational Morphology.
Proceedings of the 11th International Conference on Computational Semantics (IWCS 2015), London, 58–63.
[paper]
-
Mladen Karan, Goran Glavaš, Jan Šnajder, Bojana Dalbelo Bašić, Ivan Vulić, Marie-Francine Moens (2015).
TKLBLIIR: Detecting Twitter Paraphrases with TweetingJay.
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, 70–74.
[paper]
[data]
2014
-
Goran Glavaš and Jan Šnajder (2014).
Constructing Coherent Event Hierarchies from News Stories.
Proceedings of the Workshop on Graph-based Methods for Natural Language Processing
(TextGraphs-9) at 19th Conference on Empirical Methods in Natural Language
Processing (EMNLP'14), Doha, 1–5.
[paper]
- Sebastian Padó, Britta Zeller, Jan Šnajder (2014).
Towards Semantic Validation of a Derivational Lexicon.
Proceedings the 25th International Conference on Computational Linguistics (COLING 2014), Dublin, 1728–1739.
[paper]
- Filip Boltužić and Jan Šnajder (2014).
Back up your Stance: Recognizing Arguments in Online Discussions.
Proceedings of the First Workshop on Argumentation Mining (ArgMining 2014), Association for Computational Linguistics, ACL 2014, Baltimore, 49–58.
[paper]
[slides]
[data]
- Goran Glavaš and Jan Šnajder (2014).
Construction and Evaluation of Event Graphs.
Natural Language Engineering, 21(4): 607–652.
[paper]
[data]
- Goran Glavaš and Jan Šnajder (2014).
Event Graphs for Information Retrieval and Multi-Document Summarization.
Expert Systems with Applications, 41(15): 6904–6916.
[paper]
- Jan Šnajder (2014).
DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian.
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, 3371–3377.
[paper]
[poster]
[data]
- Goran Glavaš, Jan Šnajder, Marie-Francine Moens, Parisa Kordjamshidi
(2014).
HiEve: A Corpus for Extracting Event Hierarchies from News Stories.
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, 3678–3683.
[paper]
- Frane Šarić, Bojana Dalbelo Bašić Marie-Francine Moens, Jan Šnajder (2014).
Multi-Label Classification of Croatian Legal Documents using EuroVoc Thesaurus.
Proceedings of SPLeT-Semantic Processing of Legal Texts: Legal Resources and Access to Law workshop, LREC'14, Reykjavik, 7–12.
[paper]
[data]
- Petra Almić and Jan Šnajder (2014).
Determining the Semantic Compositionality of Croatian Multi-Word Expressions.
Proceedings of the Ninth Language Technologies Conference, Information Society (IS-JT 2014), Ljubljana, 32–37.
[paper]
[slides]
[data]
-
Krešimir, Baksa, Dino, Dolović, Goran, Glavaš, Jan Šnajder (2014).
Named Entity Recognition in Croatian Tweets.
Proceedings of the Ninth Language Technologies Conference, Information Society (IS-JT 2014), Ljubljana, 85–89.
[paper]
-
Siniša Biđin, Jan Šnajder, Goran Glavaš (2014).
Predicting Croatian Phrase Sentiment Using a Deep Matrix-Vector Model.
Proceedings of the Ninth Language Technologies Conference, Information Society (IS-JT 2014), Ljubljana, 95–98.
[paper]
-
Luka Skukan, Goran Glavaš, Jan Šnajder (2014).
HeidelTime.Hr: Extracting and Normalizing Temporal Expressions in Croatian.
Proceedings of the Ninth Language Technologies Conference, Information Society (IS-JT 2014), Ljubljana, 99–103.
[paper]
[data]
-
Leo Zuanović, Mladen Karan, Jan Šnajder (2014).
Experiments with Neural Word Embeddings for Croatian.
Proceedings of the Ninth Language Technologies Conference, Information Society (IS-JT 2014), Ljubljana, 69–72.
[paper]
2013
-
Jan Šnajder (2013).
Models for Predicting the Inflectional Paradigm of Unknown Croatian Words.
Slovenščina 2.0, 1(2): 1–34.
[paper]
-
Goran Glavaš and Jan Šnajder (2013).
Event-Centered Information Retrieval Using Kernels on Event Graphs.
Proceedings of the 8th Workshop on Graph-Based Methods in Natural Language Processing (TextGraphs-8), Washington, 1–5.
[paper]
[data]
-
Britta Zeller, Jan Šnajder, Sebastian Padó (2013).
DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, 1201–1211.
[paper]
[slides]
[data]
-
Sebastian Padó, Jan Šnajder, Britta Zeller (2013).
Derivational Smoothing for Syntactic Distributional Semantics.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, 731–735.
[paper]
[slides]
-
Jan Šnajder, Sebastian Padó, Željko Agić (2013).
Building and Evaluating a Distributional Memory for Croatian.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, 784–789.
[paper]
[slides] [data]
-
Goran Glavaš and Jan Šnajder (2013).
Recognizing Identical Events with Graph Kernels.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, 797–803.
[paper]
[data]
-
Mladen Karan, Goran Glavaš, Frane Šarić, Jan Šnajder, Jure Mijić, Artur Šilić, Bojana Dalbelo Bašić (2013).
CroNER: Recognizing Named Entities in Croatian Using Conditional Random Fields.
Informatica 37: 165–172.
[paper]
[tool]
-
Mladen Karan, Lovro Žmak, Jan Šnajder (2013).
Frequently Asked Questions Retrieval for Croatian Based on Semantic Textual Similarity.
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing (BSNLP 2013), Sofia, 24–33.
[paper]
[slides]
[data]
-
Goran Glavaš, Damir Korenčić, Jan Šnajder (2013).
Aspect-Oriented Opinion Mining from User Reviews in Croatian.
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing (BSNLP 2013), Sofia, 18–23.
[paper]
[slides]
[data]
-
Marko Bekavac and Jan Šnajder (2013).
GPKEX: Genetically Programmed Keyphrase Extraction from Croatian Texts.
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing (BSNLP 2013), Sofia, 43–47.
[paper]
[slides] [data]
-
Goran Glavaš and Jan Šnajder (2013).
Exploring Coreference Uncertainty of Generically Extracted Event Mentions.
Lecture Notes in Computer Science; Computational Linguistics and Intelligent Text Processing. 7816, 408–422.
[data]
2012
-
Frane Šarić, Goran Glavaš, Mladen Karan, Jan Šnajder, Bojana Dalbelo Bašić (2012).
TakeLab: Systems for Measuring Semantic Text Similarity.
Proceedings the First Joint Conference on Lexical and Computational Semantics (*SEM 2012), Montreal, 441–448.
[paper]
[data]
-
Jan Šnajder (2012).
Guessing the Correct Inflectional Paradigm of Unknown Croatian Words.
Proceedings of the Eighth Language Technologies Conference (IS-JT 2012), Ljubljana, 185–190.
[paper]
[slides]
-
Mladen Karan, Jan Šnajder, Bojana Dalbelo Bašić (2012).
Distributional Semantics Approach to Detecting Synonyms in Croatian Language.
Proceedings of the Eighth Language Technologies Conference (IS-JT 12), Ljubljana, 111–116.
[paper]
- Tin Franović and Jan Šnajder
(2012). Speech Act Based
Classification of Email Messages in Croatian Language.
Proceedings of the Eighth Language Technologies Conference (IS-JT 2012), Ljubljana, 69–72.
[paper]
[slides]
-
Goran Glavaš, Mladen Karan, Frane Šarić, Jan Šnajder, Jure Mijić, Artur Šilić, Bojana Dalbelo Bašić (2012).
CroNER: A State-of-the-Art Named Entity Recognition and Classification for Croatian.
Proceedings of the Eighth Language Technologies Conference (IS-JT 2012), Ljubljana, 73–78.
[paper]
[slides]
-
Mladen Marović, Jan Šnajder, Goran Glavaš (2012).
Event and Temporal Relation Extraction from Croatian Newspaper Texts.
Proceedings of the Eighth Language Technologies Conference (IS-JT 2012), Ljubljana, 141–146.
[paper]
-
Goran Glavaš, Jan Šnajder, Bojana Dalbelo Bašić (2012).
Are You for Real? Learning Event Factuality in Croatian Texts.
Proceedings of the Conference on Data Mining and Data Warehouses (SiKDD 2012), Ljubljana.
[paper]
[slides]
[talk]
-
Goran Glavaš, Jan Šnajder, Bojana Dalbelo Bašić (2012).
Semi-Supervised Acquisition of Croatian Sentiment Lexicon.
Lecture notes in Artificial Intelligence (Text, Speech and Dialogue, 15th International Conference, TSD 2012, Brno), 7499: 166–173.
[data]
-
Hrvoje Peradin, Jan Šnajder, Bojana Dalbelo Bašić (2012).
Towards a Constraint Grammar Based Morphological Tagger for Croatian.
Lecture notes in Artificial Intelligence (Text, Speech and Dialogue, 15th International Conference, TSD 2012, Brno), 7499: 174–182.
-
Frane Šarić, Jan Šnajder, Bojana Dalbelo Bašić (2012).
Optimizing Sentence Boundary Detection for Croatian.
Lecture notes in Artificial Intelligence (Text, Speech and Dialogue, 15th International Conference, TSD 2012, Brno), 7499: 105–111.
[data]
-
Goran Glavaš, Krešimir Fertalj, Jan Šnajder (2012).
Syntax-based Requirements Analysis for Data-Driven Application Development.
Proceedings of Natural Language Processing and Information Systems (NLDB 2012), Groningen, 339–344.
[paper]
-
Goran Glavaš, Jan Šnajder, Bojana Dalbelo Bašić (2012).
Experiments on Hybrid Corpus-Based Sentiment Lexicon Acquisition.
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, 1–9.
[paper]
[data]
-
Mladen Karan, Jan Šnajder, Bojana Dalbelo Bašić (2012).
Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian.
Proceedings of Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, 657–662.
[paper]
2011
-
Tomislav Lombarović, Jan Šnajder, Bojana Dalbelo Bašić (2011).
Question Classification for a Croatian QA System.
Lecture Notes in Artificial Intelligence (Third Int. Workshop on Balto-Slavonic Natural Language Processing), 6836, 403–410.
[slides]
-
Vedrana Janković, Jan Šnajder, Bojana Dalbelo Bašić (2011).
Random Indexing Distributional Semantic Models for Croatian Language.
Lecture Notes in Artificial Intelligence (Third Int. Workshop on Balto-Slavonic Natural Language Processing), 6836, 411–418.
[data]
-
Josip Saratlija, Jan Šnajder, Bojana Dalbelo Bašić (2011).
Unsupervised Topic-Oriented Keyphrase Extraction and its Application to Croatian.
Lecture Notes in Artificial Intelligence (14th International Conference on Text, Speech and Dialogue), 6836, 340–347.
2010
-
Jan Šnajder and Bojana Dalbelo Bašić (2010).
A Computational Model of Croatian Derivational Morphology.
Proceedings of the Seventh International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL'10), Dubrovnik, 109–117.
[paper]
-
Jure Mijić, Jan Šnajder, Bojana Dalbelo Bašić (2010).
Robust Keyphrase Extraction for a Large-Scale Croatian News Production System.
Proceedings of the Seventh International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL'10), Dubrovnik, 59–66.
[paper]
-
Mladen Mikša, Jan Šnajder, Bojana Dalbelo Bašić (2010).
Correcting Word Merge Errors in Croatian Texts.
Proceedings of Seventh International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL'10), Dubrovnik, 67–75.
[paper]
-
Mladen Marović, Mladen Mikša, Jan Šnajder, Bojana Dalbelo Bašić (2010).
Croatian OCR Error Correction Using Character Confusions and Language Modelling.
Proceedings of the 21st Central European Conference on Information and Intelligent Systems (CECIIS 2010), Varaždin, 281–288.
[paper]
-
Sanja Seljan, Marko Tadić, Željko Agić, Jan Šnajder, Bojana Dalbelo Bašić, Vjekoslav Osmann (2010).
Corpus Aligner (CorAl) Evaluation on English- Croatian Parallel Corpora.
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, 3481–3484.
[paper]
-
Saša Petrović, Jan Šnajder, Bojana Dalbelo Bašić (2010).
Extending Lexical Association Measures for Collocation Extraction.
Computer Speech and Language, 24(2): 383–394.
2009
-
Davor Delač, Zoran Krleža, Bojana Dalbelo Bašić, Jan Šnajder, Frane Šarić (2009).
TermeX: A Tool for Collocation Extraction.
Lecture Notes in Computer Science (Computational Linguistics and Intelligent Text Processing), 5449, 149–157.
-
Jan Šnajder, Bojana Dalbelo Bašić (2009).
String Distance-Based Stemming of the Highly Inflected Croatian Language.
Proceedings of Recent Advances in Natural Language Processing (RANLP 2009), Borovets, 411–415.
[paper]
- Nikola Šantić, Jan Šnajder, Bojana Dalbelo Bašić (2009).
Automatic Diacritics Restoration in Croatian Texts.
Proceedings of The Future of Information Sciences, Digital Resources and Knowledge Sharing (INFuture 2009), Zagreb, 309–318.
[paper]
- Renee Ahel, Bojana Dalbelo Bašić, Jan Šnajder (2009).
Automatic Keyphrase Extraction from Croatian Newspaper Articles.
Proceedings of The Future of Information Sciences, Digital Resources and Knowledge Sharing (INFuture 2009), Zagreb, 207–218.
[paper]
-
Sanja Seljan, Bojana Dalbelo Bašić, Jan Šnajder, Davor Delač, Matija Šamec-Gjurin, Dina Crnec. (2009).
Comparative Analysis of Automatic Term and Collocation Extraction.
Proceedings of The Future of information Sciences, Digital Resources and knowledge Sharing (INFuture 2009), Zagreb, 219–228.
[paper]
-
Marko Čupić, Jan Šnajder, Bojana Dalbelo Bašić (2009).
Post-Test Analysis of Automatically Generated Multiple Choice Exams: A Case Study.
Proceedings of ICL 2009, Vienna: International Association of Online Engineering (published electronically).
[paper]
- Jan Šnajder, Bojana Dalbelo Bašić, Marko Tadić. (2009).
Lexicon-Based Morphological Normalisation and its Application to Croatian Language.
Technologies for the Processing and Retrieval of Semi-Structured Documents: Experience from the CADIAL Project. Zagreb: Croatian Language Technologies Society, 23–80.
2008
-
Jan Šnajder, Bojana Dalbelo Bašić (2008).
Higher-order Functional Representation of Croatian Inflectional Morphology.
Proceedings of the Sixth International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL'08), Dubrovnik, 121–130.
[paper]
-
Jan Šnajder, Bojana Dalbelo Bašić, Saša Petrović, Ivan Sikirić (2008).
Evolving New Lexical Association Measures Using Genetic Programming.
Proceedings of 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008), Ohio, 181–184.
[paper]
-
Jure Mijić, Bojana Dalbelo Bašić, Jan Šnajder (2008).
Building a Search Engine Model with Morphological Normalization Support.
Proceedings of the 30th International Conference on Information Technology Interfaces (ITI 2008), Cavtat, 619-624.
[paper]
-
Jan Šnajder, Marko Čupić, Bojana Dalbelo Bašić, Saša Petrović (2008).
Enthusiast: An Authoring Tool for Automatic Generation of Paper-and-Pencil Multiple-Choice Tests.
Proceedings of ICL 2008, Villach (published electronically).
-
Jan Šnajder, Bojana Dalbelo Bašić, Marko Tadić (2008).
Automatic Acquisition of Inflectional Lexica for Morphological Normalisation.
Information Processing and Management, 44(5): 1720–1731.
-
Mislav Malenica, Tom Šmuc, Jan Šnajder, Bojana Dalbelo Bašić (2008).
Language Morphology Offset: Text Classification on a Croatian-English Parallel Corpus.
Information Processing and Management, 44(1): 325–339.
2007 –
-
Saša Petrović, Jan Šnajder, Bojana Dalbelo Bašić, Mladen Kolar (2006).
Comparison of Collocation Extraction Measures for Document Indexing.
Journal of Computing and Information Technology, 14(4): 321–327.
[paper]
-
Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder (2005).
Computer-Aided Document Indexing System.
Journal of Computing and Information Technology, 13(4): 299–305.
[paper]
-
Artur Šilić, Frane Šarić, Bojana Dalbelo Bašić, Jan Šnajder (2007).
TMT: Object-Oriented Text Classification Library.
Proceedings of the 29th International Conference on Information Technology Interfaces (ITI 2007), Cavtat, 559–566.
[paper]
-
Frane Šarić, Jan Šnajder, Bojana Dalbelo Bašić, Hrvoje Eklić (2005).
Enhanced Thesaurus Terms Extraction for Document Indexing.
Proceedings of the 27th International Conference on Information Technology Interfaces (ITI 2005), Cavtat, 227–232.
-
Slobodan Ribarić, Jan Šnajder (2005).
Mapping Petri Net-Based Temporal Knowledge Representation Scheme into CP-Net Model.
Proceedins of 28th international convention MIPRO 2005, Rijeka, 134–139.
[ps]
-
Marko Čupić, Jan Šnajder, Bojana Dalbelo Bašić (2003).
Educational Interactive Software as a Support to the Teaching of Artificial Neural Network Methodology Applied to a Classification Problem.
Proceedings of the 2nd International Conference on Multimedia and Information & Communication Technologies in Education (m-ICTE2003), Badajoz, 1975–1979.
-
Jan Šnajder, Mario Kovač, Bojana Dalbelo Bašić (2001).
Analiza vremenske redundancije i kompenzacija pokreta kod digitalnog videa korištenjem programskog sustava Mathematica.
Prvi znanstveno-stručni skup Programski sustav Mathematica u znanosti, tehnologiji i obrazovanju (PrimMath 2001), Zagreb, 267–285.
Software & Data
Students
Current PhD Students
- Domagoj Alagić - representation learning
- Filip Boltužić - argumentation mining
- Matej Gjurković - author profiling
- Zoran Medić - citation analysis
- Martin Tutek - representation learning
Current MA Students
- Ivan Anić
- Ivan Crnomarković
- Fabijan Čorak
- Martin Čolja
- Mihovil Ilakovac
- Dorian Ivanković
- Mislav Jurić
- Vinko Kašljević
- Roman Kerčmar
- Toni Kukurin
- Luka Markušić
- Mate Mijolović
- Leo Obadić
- Gregor Orlić
- Kristijan Palić
- Marin Sokol
- Filip Šaina
- Doria Šarić
- Dominik Čubelić
- Ivan Lovrečić
- Juraj Vladika
Current BA Students
- Rino Čala
- Ivan Križanić
- Vedran Kurdija
- Patrik Matošević
- Ante Miličević
- Marin Petričević
- Ante Pušić
- Mario Zec
- Josip Žaja
Completed PhD Theses
- Damir Korenčić (with Dr. Strahil Ristov).
Computational Methods for Modelling and Analysis of the Media Agenda Based on Machine Learning.
2019.
-
Tamara Sladoljev-Agejev (with Prof. Svjetlana Kolić-Vehovec).
Expository Text Processing in L2.
2018.
-
Mladen Karan.
Computer-Aided Construction and Sematnic Search of Question and Answer Collections.
2017.
-
Vedran Galetić (with Prof. Marijan Palmović).
Formalization and Quantification of a Cognitively Motivated Conceptual Space Model Based on The Prototype Theory.
2016.
-
Goran Glavaš.
Text Information and Retrieval Based on Event Graphs.
2014.
Completed MA Theses
- Jure Baban. Debiasing Methods for Demographic Information in Textual Data. 2019.
- Mihaela Bošnjak (with Dr. Mladen Karan). Personality Type Detection using Transfer Learning and Language Models. 2019.
- Bruno Gavranović. Compositional Deep Learning. 2019.
- Viktor Golem. Machine Learning Models for Detecting Psychological Illnesses and Disorders of Text Authors. 2019.
- Robert Injac. Predicting Text Readability using Natural Language Processing Methods. 2019.
- Marin Krešo (with Dr. Mladen Karan). Text classification for Croatian using Text Embeddings Derived from Contextualized Language Models. 2019.
- Marin Kukovačec. Deep Learning Models for Automated Document Summarization. 2019.
- David Lozić. A System for Extrinsic and Intrinsic Plagiarism Detection in Student Theses. 2019.
- Ivan Mršić. Recognition and Classification of Mental Health Issues from Short Text using Multi-Task Learning. 2019.
- Lukrecija Puljić. Computational Models for Determining the Personality Type of Text Authors. 2019.
- Tome Radman. Resource-Scarce Transfer Learning Using Language Model Pretraining. 2019.
- Antonio Šajatović. Bayesian Deep Learning for Text Classification. 2019.
- Bruno Šarlija (with Dr. Mladen Karan). Learning-to-Rank Model for Croatian based on Contextualized Multilingual Language Models. 2019.
- Leonard Volarić-Horvat. Computational Stylometry Analysis of Croatian Parliamentary Discussions. 2019.
- Marin Biloš. Deep Learning Models for Reading Comprehension. 2018.
- Ana Brassard. Text-based Religious Affiliation Prediction. 2018.
- Luka Dulčić. Customer Support Chatbot Based on Deep Learning. 2018.
- Bartol Freškura. Customer Support Chatbot Based on Deep Learning. 2018.
- Martin Gluhak. Extracting Semantic Relations between Entities from Scientific Texts. 2018.
- Paula Gombar. Recency Ranking Models for Web Search. 2018.
- Tin Kuculo. A Neural-Based Text Generation System. 2018.
- Mihael Nikić. Machine Translation Model Based on Convolutional Neural Networks. 2018.
- Ivan Paljak. Semantic Taxonomy Learning from Online Debates. 2018.
- Ivan Sekulić. Text-Based Mental Disorder Prediction from Online Discussions Texts. 2018.
- Fredi Šarić. Event Linking in Large News Collections. 2018.
- Jasmin Zukić. A Self-Learning Web Application for a Word Guessing Game Implemented in Haskell. 2018.
- Filip Čulinović. Evaluating Croatian Language Word Representations. 2017.
- Tomislav Marinković. Deep Learning Models for the Analysis of User Comments on Social Networks. 2017.
- Matej Paradžik. Domain Adaptation for Sentiment Analysis from Text. 2017.
- Tena Perak. Semantic Analysis of Math Word Problems. 2017.
- Luka Skukan. Application of Compositional Distributional Semantics for Semantic Text Similarity. 2017.
- Maja Buljan. Multiword Identification Based on the Combination of Linguistic Features. 2016.
- Vjeran Crnjak. Learning to Search for Solving Natural Language Processing Tasks. 2016.
- Zoran Medić. Compositional Distributional Semantics Based on the Lexical Function Model. 2016.
- Dino Radaković. A Joint Model for Named Entity Relation Extraction. 2016.
- Sven Vidak. Deep Learning for Language Modeling of the Croatian Language. 2016.
- Toni Antunović. Automated Extraction of Bilingual Lexicons Based on Semantic Vector Spaces. 2015.
- Krešimir Baksa. Shallow Semantic Parsing of Croatian Texts. 2015.
- Dino Dolović. Sentiment Analysis in Tweets in Croatian Language. 2015.
- Goran Gašić. Deep Learning of Word Embeddings for Tagging Models for Croatian Texts. 2015.
- Lana Lisjak. Recognizing Textual Entailment in Croatian Texts. 2015.
- Hermina Petric Maretić. Project Proposals Analysis using Statistical Natural Language Processing. 2015.
- Mihael Šafarić. Feature Selection and Document Representation Methods for Text Classification. 2015.
- Petra Almić. A Model for Determining Semantic Compositionality of Croatian Multi-Word Expressions. 2014.
- Marko Bekavac. Word Sense Induction and Discrimination Model for Croatian Words. 2014.
- Petra Bevandić. Optimizing Dependency Parsing Parameters for Croatian Language. 2014.
- Siniša Biđin. Using Deep Learning for Sentiment Analysis of Croatian Expressions. 2014.
- Luka Krajcar. Sentiment Analysis of Tweets in Croatian Language. 2014.
- Lovro Rožić (with Mladen Vuković). Functional Programming. 2014.
- Martin Tutek. Multi-label Document Classification using EuroVoc Thesaurus. 2014.
- Leo Zuanović. Recurrent Neural Network Based Model of Croatian Language. 2014.
- Filip Petkovski. Application of Partial Membership Models to Keyphrase Extraction from Croatian Documents. 2013.
- Tin Franović. Classification of Email Importance Based on Speech Acts. 2013.
- Matija Hanževački. Coreference Resolution in Croatian Texts. 2013.
- Josip Bakić. Automatic Content Extraction from Web Pages. 2012.
- Sonja Grđan. Application of Machine Learning Methods for EEG-Based Brain-Computer Interface. 2012.
- Ante Kegalj. Sentiment Analysis Based on Prior Word Polarity. 2012.
- Ivan Krišto. Using Machine Learning Methods to Improve Document Retrieval. 2012.
- Tomislav Lombarović. Named Entity Recognition and Classification for Text in Croatian Language. 2012.
- Mladen Marović. Event and Temporal Relation Extraction in Croatian Language Texts. 2012.
- Hrvoje Peradin. Constraint Grammar-based Parsing of Croatian Texts. 2012.
- Veljko Srdarević. Text Report Generation Based on Structured Data. 2012.
- Fran Dragomanović (with Prof. Bojana Dalbelo Bašić). Acronym Extraction in Croatian Language. 2011.
- Zoran Hranj. Unsupervised Coreference Resolution. 2011.
- Vedrana Janković. Computational Models of Distributional Lexical Semantics in Croatian Language. 2011.
- Ivan Kmetović. Matching Co-referent Named Entities Using Machine Learning. 2011.
- Slavko Kručaj. Applying Machine Learning Methods to User Review Summarization. 2011.
- Ivan Kusalić. Application of Topic Models to Analysis of Croatian Documents. 2011.
- Ognjen Lajšić. Grammar and Style Checker for Croatian Language. 2011.
- Vladimir Manzin. Computer Agents for Poker. 2011.
- Vjekoslav Osmann. Tagging Parts of Speech in Croatian Texts. 2011.
- Paško Pajdek. Deep Generative Models for Semantic Document Clustering. 2011.
- Josip Saratlija. Unsupervised Parser for Croatian Language. 2011.
- Nikola Šantić. Automatic Paraphrasing of Croatian Expressions and Sentences. 2011.
- Matea Biočić (with Prof. Bojana Dalbelo Bašić). Word Sense Discrimination Using Expectation Maximization Algorithm. 2010.
- Zlatan Hot (with Prof. Bojana Dalbelo Bašić). A Stemming Algorithm Based on String Clustering. 2010.
- Marin Japec (with Prof. Bojana Dalbelo Bašić). System for Organizing and Sharing Knowledge Based on Topic Maps. 2010.
- Matija Lacković (with Prof. Bojana Dalbelo Bašić). Program Environment for Execution of Tournaments for Game Playing Algorithms. 2010.
- Nikola Novak (with Prof. Bojana Dalbelo Bašić). Implementation of a Game Simulator and Checkers Game-playing Algorithms. 2010.
- Ivan Šolta (with Prof. Bojana Dalbelo Bašić). Determining Semantic Orientation of Subjective Words and Phrases. 2010.
- Davor Delač (with Prof. Bojana Dalbelo Bašić). Collocation Extraction from Corpus. 2009.
- Lovro Žmak (with Prof. Bojana Dalbelo Bašić). FAQ Retrieval System for Croatian Language. 2009.
- Srđan Vuković (with Prof. Bojana Dalbelo Bašić). A Heuristic Algorithm for Matching of Address Data. 2008.
Completed BA Theses
- Livio Benčik. Interpreting Neural Models for Natural Language Processing. 2019.
- Martin Čolja. Machine Learning Models for Toxic Comments Detection on the Internet. 2019.
- Dominik Čubelić. Semantic Segmentation of Invoice Text. 2019.
- Fran Grgić. Classification of Importance and Urgency of Push Notifications using Machine Learning. 2019.
- Ivan Lovrečić. Author Profiling of Social Media Users. 2019.
- Niko Palić. Product Review Sentiment Analysis. 2019.
- Marin Petričević. Deep Learning Models for Predicting Users' Comment Controversiality. 2019.
- Juraj Vladika. Deep Learning Models for Extractive Document Summarization. 2019.
- Ivan Crnomarković. Machine Learning Models for Narrative Text Understanding. 2018.
- Fabijan Čorak. Hate Speech Detection on Social Networks Using Machine Learning. 2018.
- Mihovil Ilakovac. Machine Learning Models for Sarcasm Detection in Users' Comments. 2018.
- Vinko Kašljević. Cross-Domain Sentiment Analysis Methods for Croatian Language. 2018.
- Roman Kerčmar. Sentiment Analysis from Customer Reviews of Hospitality and Catering Businesses. 2018.
- Mate Mijolović. Deep Learning Models for Named Entity Recognition and Classification. 2018.
- Gregor Orlić. Machine Learning Models for Emoji Prediction from Text. 2018.
- Josip Torić. Computational Statistical Analysis of the Language of Religious Discussions on Internet Forums. 2018.
- Fran-Andrija Arbanas. Question Type Classification for a Natural Language Database Interface. 2017.
- Marin Kukovačec. Automated Sarcasm Detection in Social Network Users' Comments. 2017.
- Toni Kukurin. Claim and Stance Classification in Online Discussions Using Machine Learning. 2017.
- David Lozić. Intrinsic Plagiarism Detection in Student Theses. 2017.
- Juraj Malenica. Question Context Prediction for an Interactive Natural Language Database Interface. 2017.
- Ivan Mršić. Entity Recognition and Classification for a Natural Language Database Interface. 2017.
- Lukrecija Puljić. Author Profiling on Social Networks Using Machine Learning. 2017.
- Filip Šaina. Sentiment Summarization from Student Course Questionnaires. 2017.
- Antonio Šajatović. Predicting Newsworthiness of News Stories Using Machine Learning. 2017.
- Doria Šarić. Computational Analysis of the Similarity of Math Word Problems. 2017.
- Ivan Tokić. Cross-Lingual Plagiarism Detection from Wikipedia. 2017.
- Bartol Freškura. Application of Deep Learning for Stance Detection in User Comments. 2016.
- Bruno Gavranović. Application of Deep Learning for Sentiment Analysis. 2016.
- Filip Hrenić. Detection of Inappropriate Messages in Online Chats. 2016.
- Marin Kačan. Detecting Lexical Transfer Errors of Second Language Learners. 2016.
- Mihael Nikić. Application of Machine Learning for Topic-Based Sentiment Analysis. 2016.
- Stipan Mikulić. Use of Distributional Semantic Models in the Word Association Game. 2016.
- Filip Čulinović. Acquisition of Verb Classes from Corpus using Unsupervised Machine Learning. 2015.
- Paula Gombar. Contextual Sentiment Analysis of Croatian Expressions. 2015.
- Ivan Paljak. Stance Classification and Analysis in Online User Comments. 2015.
- Ivan Sekulić. Extraction of Semantic Verb Relations from Croatian Corpora. 2015.
- Jura Šlosel. Entity-Based Coherence Model for Croatian Texts. 2015.
- Vjeran Crnjak. Part-of-Speech Tagging for Croatian using Conditional Random Fields. 2014.
- Stjepan Glavina. Machine Learning of Document Classification Rules. 2014.
- Zoran Medić. Quotation Extraction from News Stories in Croatian Language. 2014.
- Matej Paradžik. Semi-Supervised Acquisition of Sentiment Polarity Lexicon. 2014.
- Dino Radaković. Applying Semantic Kernel Functions in Text Classification. 2014.
- Luka Skukan. Temporal Expression Tagging for Croatian Texts. 2014.
- Sandra Trkulja. Feature Construction and Selection for Document Classification in Croatian Language. 2014.
- Sven Vidak. Offensive Text Detection using Machine Learning Methods. 2014.
- Ivana Balažević. Document Clustering Using Self-organizing Neural Networks. 2012.
- Marko Bekavac. Application of Genetic Programming in Keyphrase Extraction. 2012.
- Petra Bevandić. Automatic Natural Language Identification. 2012.
- Goran Gašić. Automatic Tagging of Croatian Newswire Articles. 2012.
- Luka Krajcar. Error Correction in Texts Produced by Speech Recognition of Croatian. 2012.
- Zolik Nemet. Extraction of Acronyms from Corpus of Texts in Croatian Language. 2012.
- Roko Pancirov. Automatic Extraction of Bilingual Dictionaries Based on Wikipedia. 2012.
- Martin Tutek. Using Wikipedia for Automatic Word Sense Disambiguation. 2012.
- Leo Zuanović. Machine Learning of Croatian Lemmatization Rules. 2012.
- Siniša Biđin. A Controlled Natural Language Parser. 2011.
- Matija Hanževački. Temporal Expression Tagging in Croatian Texts. 2011.
- Ante Kegalj (with Prof. Bojana Dalbelo Bašić). Automated Sentence Boundary Detection. 2010.
- Tomislav Lombarović (with Prof. Bojana Dalbelo Bašić). Question Type Classification for Information Retrieval Systems. 2010.
- Mladen Marović (with Prof. Bojana Dalbelo Bašić). OCR Error Correction. 2010.
- Mladen Mikša (with Bojana Dalbelo Bašić). Correction of Merged Words Errors in Texts Obtained by Optical Character Recognition. 2010.
- Veljko Srdarević (with Prof. Bojana Dalbelo Bašić). Building a Stemming Algorithm Using Genetic Programming. 2010.
- Zoran Hranj (with Prof. Bojana Dalbelo Bašić). Structure-Based Web Page Comparison Algorithm. 2009.
- Ivan Karačić (with Prof. Bojana Dalbelo Bašić). Word Sense Discrimination. 2009.
- Ivan Kmetović (with Prof. Bojana Dalbelo Bašić). Keyword Extraction from Text Using Decision Trees. 2009.
- Ivan Krišto (with Prof. Bojana Dalbelo Bašić). Web Page Cleaning Techniques for Text Mining. 2009.
- Ognjen Lajšić (with Prof. Bojana Dalbelo Bašić). OCR Error Correction. 2009.
- Josip Saratlija (with Prof. Bojana Dalbelo Bašić). Keyword Extraction Based on Document Clustering. 2009.
- Nikola Šantić (with Prof. Bojana Dalbelo Bašić). Automatic Diacritics Restoration in Croatian Texts. 2009.
- Igor Šoš (with Prof. Bojana Dalbelo Bašić). Client Side of Distributed Linguistic Resource Annotator. 2009.
- Marin Japec (with Prof. Bojana Dalbelo Bašić). Dialogue System in Croatian Language. 2008.
- Željko Rumenjak (with Prof. Bojana Dalbelo Bašić). Distributed linguistic resource annotator. 2008.
- Ivan Šolta (with Prof. Bojana Dalbelo Bašić). Query Correction Based on Levenshtein Distance. 2008.