Jan Šnajder

Jan Šnajder Jan Šnajder, PhD
Associate Professor

Text Analysis and Knowledge Engineering Lab
Faculty of Electrical Engineering and Computing
University of Zagreb, Croatia

Phone: +385 1 6129 653
Email: jan (dot) snajder (at) fer (dot) hr

LinkedIn   Scholar

I am an Associate Professor at the Faculty of Electrical Engineering and Computing (FER) at the University of Zagreb and a member of Text Analysis and Knowledge Engineering Lab (TakeLab). My research interests are in natural language processing (NLP) and machine learning. My current focus is on lexical semantics, information extraction, and opinion mining. I am a fan of functional programming, Haskell in particular.

Short Bio

I received my MSc and PhD degrees in Computer Science from the University of Zagreb, Faculty of Electrical Engineering and Computing (UNIZG FER), Zagreb, Croatia in 2006 and 2010, respectively. From 2002 I was working as a research assistant and from 2016 I am working as an Associate Professor at UNIZG FER. In 2012 and 2013 I was a visiting researcher at the Department of Computational Linguistics at Heidelberg University. In 2015 I was a visiting researcher at the NICT in Kyoto, and in 2014 and 2015 a visiting researcher at the IMS, Stuttgart University. In 2016 I was a visiting researcher at the Department of Computing and Information Systems, University of Melbourne.

Curriculum vitae















2007 –

Software & Data


Current PhD Students

  1. Domagoj Alagić - representation learning
  2. Filip Boltužić - argumentation mining
  3. Matej Gjurković - author profiling
  4. Zoran Medić - citation analysis
  5. Martin Tutek - representation learning

Current MA Students

  1. Ivan Anić
  2. Ivan Crnomarković
  3. Fabijan Čorak
  4. Martin Čolja
  5. Mihovil Ilakovac
  6. Dorian Ivanković
  7. Mislav Jurić
  8. Vinko Kašljević
  9. Roman Kerčmar
  10. Toni Kukurin
  11. Luka Markušić
  12. Mate Mijolović
  13. Leo Obadić
  14. Gregor Orlić
  15. Kristijan Palić
  16. Marin Sokol
  17. Filip Šaina
  18. Doria Šarić
  19. Dominik Čubelić
  20. Ivan Lovrečić
  21. Juraj Vladika

Current BA Students

  1. Rino Čala
  2. Ivan Križanić
  3. Vedran Kurdija
  4. Patrik Matošević
  5. Ante Miličević
  6. Marin Petričević
  7. Ante Pušić
  8. Mario Zec
  9. Josip Žaja

Completed PhD Theses

  1. Damir Korenčić (with Dr. Strahil Ristov). Computational Methods for Modelling and Analysis of the Media Agenda Based on Machine Learning. 2019.
  2. Tamara Sladoljev-Agejev (with Prof. Svjetlana Kolić-Vehovec). Expository Text Processing in L2. 2018.
  3. Mladen Karan. Computer-Aided Construction and Sematnic Search of Question and Answer Collections. 2017.
  4. Vedran Galetić (with Prof. Marijan Palmović). Formalization and Quantification of a Cognitively Motivated Conceptual Space Model Based on The Prototype Theory. 2016.
  5. Goran Glavaš. Text Information and Retrieval Based on Event Graphs. 2014.

Completed MA Theses

  1. Jure Baban. Debiasing Methods for Demographic Information in Textual Data. 2019.
  2. Mihaela Bošnjak (with Dr. Mladen Karan). Personality Type Detection using Transfer Learning and Language Models. 2019.
  3. Bruno Gavranović. Compositional Deep Learning. 2019.
  4. Viktor Golem. Machine Learning Models for Detecting Psychological Illnesses and Disorders of Text Authors. 2019.
  5. Robert Injac. Predicting Text Readability using Natural Language Processing Methods. 2019.
  6. Marin Krešo (with Dr. Mladen Karan). Text classification for Croatian using Text Embeddings Derived from Contextualized Language Models. 2019.
  7. Marin Kukovačec. Deep Learning Models for Automated Document Summarization. 2019.
  8. David Lozić. A System for Extrinsic and Intrinsic Plagiarism Detection in Student Theses. 2019.
  9. Ivan Mršić. Recognition and Classification of Mental Health Issues from Short Text using Multi-Task Learning. 2019.
  10. Lukrecija Puljić. Computational Models for Determining the Personality Type of Text Authors. 2019.
  11. Tome Radman. Resource-Scarce Transfer Learning Using Language Model Pretraining. 2019.
  12. Antonio Šajatović. Bayesian Deep Learning for Text Classification. 2019.
  13. Bruno Šarlija (with Dr. Mladen Karan). Learning-to-Rank Model for Croatian based on Contextualized Multilingual Language Models. 2019.
  14. Leonard Volarić-Horvat. Computational Stylometry Analysis of Croatian Parliamentary Discussions. 2019.
  15. Marin Biloš. Deep Learning Models for Reading Comprehension. 2018.
  16. Ana Brassard. Text-based Religious Affiliation Prediction. 2018.
  17. Luka Dulčić. Customer Support Chatbot Based on Deep Learning. 2018.
  18. Bartol Freškura. Customer Support Chatbot Based on Deep Learning. 2018.
  19. Martin Gluhak. Extracting Semantic Relations between Entities from Scientific Texts. 2018.
  20. Paula Gombar. Recency Ranking Models for Web Search. 2018.
  21. Tin Kuculo. A Neural-Based Text Generation System. 2018.
  22. Mihael Nikić. Machine Translation Model Based on Convolutional Neural Networks. 2018.
  23. Ivan Paljak. Semantic Taxonomy Learning from Online Debates. 2018.
  24. Ivan Sekulić. Text-Based Mental Disorder Prediction from Online Discussions Texts. 2018.
  25. Fredi Šarić. Event Linking in Large News Collections. 2018.
  26. Jasmin Zukić. A Self-Learning Web Application for a Word Guessing Game Implemented in Haskell. 2018.
  27. Filip Čulinović. Evaluating Croatian Language Word Representations. 2017.
  28. Tomislav Marinković. Deep Learning Models for the Analysis of User Comments on Social Networks. 2017.
  29. Matej Paradžik. Domain Adaptation for Sentiment Analysis from Text. 2017.
  30. Tena Perak. Semantic Analysis of Math Word Problems. 2017.
  31. Luka Skukan. Application of Compositional Distributional Semantics for Semantic Text Similarity. 2017.
  32. Maja Buljan. Multiword Identification Based on the Combination of Linguistic Features. 2016.
  33. Vjeran Crnjak. Learning to Search for Solving Natural Language Processing Tasks. 2016.
  34. Zoran Medić. Compositional Distributional Semantics Based on the Lexical Function Model. 2016.
  35. Dino Radaković. A Joint Model for Named Entity Relation Extraction. 2016.
  36. Sven Vidak. Deep Learning for Language Modeling of the Croatian Language. 2016.
  37. Toni Antunović. Automated Extraction of Bilingual Lexicons Based on Semantic Vector Spaces. 2015.
  38. Krešimir Baksa. Shallow Semantic Parsing of Croatian Texts. 2015.
  39. Dino Dolović. Sentiment Analysis in Tweets in Croatian Language. 2015.
  40. Goran Gašić. Deep Learning of Word Embeddings for Tagging Models for Croatian Texts. 2015.
  41. Lana Lisjak. Recognizing Textual Entailment in Croatian Texts. 2015.
  42. Hermina Petric Maretić. Project Proposals Analysis using Statistical Natural Language Processing. 2015.
  43. Mihael Šafarić. Feature Selection and Document Representation Methods for Text Classification. 2015.
  44. Petra Almić. A Model for Determining Semantic Compositionality of Croatian Multi-Word Expressions. 2014.
  45. Marko Bekavac. Word Sense Induction and Discrimination Model for Croatian Words. 2014.
  46. Petra Bevandić. Optimizing Dependency Parsing Parameters for Croatian Language. 2014.
  47. Siniša Biđin. Using Deep Learning for Sentiment Analysis of Croatian Expressions. 2014.
  48. Luka Krajcar. Sentiment Analysis of Tweets in Croatian Language. 2014.
  49. Lovro Rožić (with Mladen Vuković). Functional Programming. 2014.
  50. Martin Tutek. Multi-label Document Classification using EuroVoc Thesaurus. 2014.
  51. Leo Zuanović. Recurrent Neural Network Based Model of Croatian Language. 2014.
  52. Filip Petkovski. Application of Partial Membership Models to Keyphrase Extraction from Croatian Documents. 2013.
  53. Tin Franović. Classification of Email Importance Based on Speech Acts. 2013.
  54. Matija Hanževački. Coreference Resolution in Croatian Texts. 2013.
  55. Josip Bakić. Automatic Content Extraction from Web Pages. 2012.
  56. Sonja Grđan. Application of Machine Learning Methods for EEG-Based Brain-Computer Interface. 2012.
  57. Ante Kegalj. Sentiment Analysis Based on Prior Word Polarity. 2012.
  58. Ivan Krišto. Using Machine Learning Methods to Improve Document Retrieval. 2012.
  59. Tomislav Lombarović. Named Entity Recognition and Classification for Text in Croatian Language. 2012.
  60. Mladen Marović. Event and Temporal Relation Extraction in Croatian Language Texts. 2012.
  61. Hrvoje Peradin. Constraint Grammar-based Parsing of Croatian Texts. 2012.
  62. Veljko Srdarević. Text Report Generation Based on Structured Data. 2012.
  63. Fran Dragomanović (with Prof. Bojana Dalbelo Bašić). Acronym Extraction in Croatian Language. 2011.
  64. Zoran Hranj. Unsupervised Coreference Resolution. 2011.
  65. Vedrana Janković. Computational Models of Distributional Lexical Semantics in Croatian Language. 2011.
  66. Ivan Kmetović. Matching Co-referent Named Entities Using Machine Learning. 2011.
  67. Slavko Kručaj. Applying Machine Learning Methods to User Review Summarization. 2011.
  68. Ivan Kusalić. Application of Topic Models to Analysis of Croatian Documents. 2011.
  69. Ognjen Lajšić. Grammar and Style Checker for Croatian Language. 2011.
  70. Vladimir Manzin. Computer Agents for Poker. 2011.
  71. Vjekoslav Osmann. Tagging Parts of Speech in Croatian Texts. 2011.
  72. Paško Pajdek. Deep Generative Models for Semantic Document Clustering. 2011.
  73. Josip Saratlija. Unsupervised Parser for Croatian Language. 2011.
  74. Nikola Šantić. Automatic Paraphrasing of Croatian Expressions and Sentences. 2011.
  75. Matea Biočić (with Prof. Bojana Dalbelo Bašić). Word Sense Discrimination Using Expectation Maximization Algorithm. 2010.
  76. Zlatan Hot (with Prof. Bojana Dalbelo Bašić). A Stemming Algorithm Based on String Clustering. 2010.
  77. Marin Japec (with Prof. Bojana Dalbelo Bašić). System for Organizing and Sharing Knowledge Based on Topic Maps. 2010.
  78. Matija Lacković (with Prof. Bojana Dalbelo Bašić). Program Environment for Execution of Tournaments for Game Playing Algorithms. 2010.
  79. Nikola Novak (with Prof. Bojana Dalbelo Bašić). Implementation of a Game Simulator and Checkers Game-playing Algorithms. 2010.
  80. Ivan Šolta (with Prof. Bojana Dalbelo Bašić). Determining Semantic Orientation of Subjective Words and Phrases. 2010.
  81. Davor Delač (with Prof. Bojana Dalbelo Bašić). Collocation Extraction from Corpus. 2009.
  82. Lovro Žmak (with Prof. Bojana Dalbelo Bašić). FAQ Retrieval System for Croatian Language. 2009.
  83. Srđan Vuković (with Prof. Bojana Dalbelo Bašić). A Heuristic Algorithm for Matching of Address Data. 2008.

Completed BA Theses

  1. Livio Benčik. Interpreting Neural Models for Natural Language Processing. 2019.
  2. Martin Čolja. Machine Learning Models for Toxic Comments Detection on the Internet. 2019.
  3. Dominik Čubelić. Semantic Segmentation of Invoice Text. 2019.
  4. Fran Grgić. Classification of Importance and Urgency of Push Notifications using Machine Learning. 2019.
  5. Ivan Lovrečić. Author Profiling of Social Media Users. 2019.
  6. Niko Palić. Product Review Sentiment Analysis. 2019.
  7. Marin Petričević. Deep Learning Models for Predicting Users' Comment Controversiality. 2019.
  8. Juraj Vladika. Deep Learning Models for Extractive Document Summarization. 2019.
  9. Ivan Crnomarković. Machine Learning Models for Narrative Text Understanding. 2018.
  10. Fabijan Čorak. Hate Speech Detection on Social Networks Using Machine Learning. 2018.
  11. Mihovil Ilakovac. Machine Learning Models for Sarcasm Detection in Users' Comments. 2018.
  12. Vinko Kašljević. Cross-Domain Sentiment Analysis Methods for Croatian Language. 2018.
  13. Roman Kerčmar. Sentiment Analysis from Customer Reviews of Hospitality and Catering Businesses. 2018.
  14. Mate Mijolović. Deep Learning Models for Named Entity Recognition and Classification. 2018.
  15. Gregor Orlić. Machine Learning Models for Emoji Prediction from Text. 2018.
  16. Josip Torić. Computational Statistical Analysis of the Language of Religious Discussions on Internet Forums. 2018.
  17. Fran-Andrija Arbanas. Question Type Classification for a Natural Language Database Interface. 2017.
  18. Marin Kukovačec. Automated Sarcasm Detection in Social Network Users' Comments. 2017.
  19. Toni Kukurin. Claim and Stance Classification in Online Discussions Using Machine Learning. 2017.
  20. David Lozić. Intrinsic Plagiarism Detection in Student Theses. 2017.
  21. Juraj Malenica. Question Context Prediction for an Interactive Natural Language Database Interface. 2017.
  22. Ivan Mršić. Entity Recognition and Classification for a Natural Language Database Interface. 2017.
  23. Lukrecija Puljić. Author Profiling on Social Networks Using Machine Learning. 2017.
  24. Filip Šaina. Sentiment Summarization from Student Course Questionnaires. 2017.
  25. Antonio Šajatović. Predicting Newsworthiness of News Stories Using Machine Learning. 2017.
  26. Doria Šarić. Computational Analysis of the Similarity of Math Word Problems. 2017.
  27. Ivan Tokić. Cross-Lingual Plagiarism Detection from Wikipedia. 2017.
  28. Bartol Freškura. Application of Deep Learning for Stance Detection in User Comments. 2016.
  29. Bruno Gavranović. Application of Deep Learning for Sentiment Analysis. 2016.
  30. Filip Hrenić. Detection of Inappropriate Messages in Online Chats. 2016.
  31. Marin Kačan. Detecting Lexical Transfer Errors of Second Language Learners. 2016.
  32. Mihael Nikić. Application of Machine Learning for Topic-Based Sentiment Analysis. 2016.
  33. Stipan Mikulić. Use of Distributional Semantic Models in the Word Association Game. 2016.
  34. Filip Čulinović. Acquisition of Verb Classes from Corpus using Unsupervised Machine Learning. 2015.
  35. Paula Gombar. Contextual Sentiment Analysis of Croatian Expressions. 2015.
  36. Ivan Paljak. Stance Classification and Analysis in Online User Comments. 2015.
  37. Ivan Sekulić. Extraction of Semantic Verb Relations from Croatian Corpora. 2015.
  38. Jura Šlosel. Entity-Based Coherence Model for Croatian Texts. 2015.
  39. Vjeran Crnjak. Part-of-Speech Tagging for Croatian using Conditional Random Fields. 2014.
  40. Stjepan Glavina. Machine Learning of Document Classification Rules. 2014.
  41. Zoran Medić. Quotation Extraction from News Stories in Croatian Language. 2014.
  42. Matej Paradžik. Semi-Supervised Acquisition of Sentiment Polarity Lexicon. 2014.
  43. Dino Radaković. Applying Semantic Kernel Functions in Text Classification. 2014.
  44. Luka Skukan. Temporal Expression Tagging for Croatian Texts. 2014.
  45. Sandra Trkulja. Feature Construction and Selection for Document Classification in Croatian Language. 2014.
  46. Sven Vidak. Offensive Text Detection using Machine Learning Methods. 2014.
  47. Ivana Balažević. Document Clustering Using Self-organizing Neural Networks. 2012.
  48. Marko Bekavac. Application of Genetic Programming in Keyphrase Extraction. 2012.
  49. Petra Bevandić. Automatic Natural Language Identification. 2012.
  50. Goran Gašić. Automatic Tagging of Croatian Newswire Articles. 2012.
  51. Luka Krajcar. Error Correction in Texts Produced by Speech Recognition of Croatian. 2012.
  52. Zolik Nemet. Extraction of Acronyms from Corpus of Texts in Croatian Language. 2012.
  53. Roko Pancirov. Automatic Extraction of Bilingual Dictionaries Based on Wikipedia. 2012.
  54. Martin Tutek. Using Wikipedia for Automatic Word Sense Disambiguation. 2012.
  55. Leo Zuanović. Machine Learning of Croatian Lemmatization Rules. 2012.
  56. Siniša Biđin. A Controlled Natural Language Parser. 2011.
  57. Matija Hanževački. Temporal Expression Tagging in Croatian Texts. 2011.
  58. Ante Kegalj (with Prof. Bojana Dalbelo Bašić). Automated Sentence Boundary Detection. 2010.
  59. Tomislav Lombarović (with Prof. Bojana Dalbelo Bašić). Question Type Classification for Information Retrieval Systems. 2010.
  60. Mladen Marović (with Prof. Bojana Dalbelo Bašić). OCR Error Correction. 2010.
  61. Mladen Mikša (with Bojana Dalbelo Bašić). Correction of Merged Words Errors in Texts Obtained by Optical Character Recognition. 2010.
  62. Veljko Srdarević (with Prof. Bojana Dalbelo Bašić). Building a Stemming Algorithm Using Genetic Programming. 2010.
  63. Zoran Hranj (with Prof. Bojana Dalbelo Bašić). Structure-Based Web Page Comparison Algorithm. 2009.
  64. Ivan Karačić (with Prof. Bojana Dalbelo Bašić). Word Sense Discrimination. 2009.
  65. Ivan Kmetović (with Prof. Bojana Dalbelo Bašić). Keyword Extraction from Text Using Decision Trees. 2009.
  66. Ivan Krišto (with Prof. Bojana Dalbelo Bašić). Web Page Cleaning Techniques for Text Mining. 2009.
  67. Ognjen Lajšić (with Prof. Bojana Dalbelo Bašić). OCR Error Correction. 2009.
  68. Josip Saratlija (with Prof. Bojana Dalbelo Bašić). Keyword Extraction Based on Document Clustering. 2009.
  69. Nikola Šantić (with Prof. Bojana Dalbelo Bašić). Automatic Diacritics Restoration in Croatian Texts. 2009.
  70. Igor Šoš (with Prof. Bojana Dalbelo Bašić). Client Side of Distributed Linguistic Resource Annotator. 2009.
  71. Marin Japec (with Prof. Bojana Dalbelo Bašić). Dialogue System in Croatian Language. 2008.
  72. Željko Rumenjak (with Prof. Bojana Dalbelo Bašić). Distributed linguistic resource annotator. 2008.
  73. Ivan Šolta (with Prof. Bojana Dalbelo Bašić). Query Correction Based on Levenshtein Distance. 2008.

Locations of visitors to this page