.. _Resources:

##########
Resources
##########

This page contains links to resources, tools, courses, blogs, newsletters and other interesting things relating to text mining for research.

Tools and software
~~~~~~~~~~~~~~~~~~

- This `universal sentence encoder <https://tfhub.dev/google/universal-sentence-encoder-xling-many/1>`_ apparently is good for clustering sentences or shorter text
- A list of `open source data labeling tools <https://github.com/heartexlabs/awesome-data-labeling#readme>`_
See `full list of 80+ tools (web apps, software, packages in R and python) <https://sagepublishing.github.io/sage_tools_social_science/2020/01/20/text-mining.html>`_ with an overview of type and classification of free vs charged.


Books and Textbooks
~~~~~~~~~~~~~~~~~~~

- `Computational Social Science with Python <https://github.com/damian0604/bdaca/blob/master/book/bd-aca_book.pdf>`_ by Damian Trilling 
- `Language processing with Python <http://www.nltk.org/book/ch01.html>`_ , by Steven Bird, Ewan Klein and Edward Loper, Copyright © 2019 

Course materials
################

- `Big data and automated content analysis <https://github.com/damian0604/bdaca>`_ by Damian Trilling
- `The python tutorial <https://docs.python.org/3/tutorial/index.html>`_
- `NLP for developers <https://www.youtube.com/watch?v=hJ1hzEJE16c&list=PL75e0qA87dlFJiNMeKltWImhQxfFwaxvv>`_ by Rasa
- `Start to end training on wikipedia corpus for topic modeling <https://www.youtube.com/watch?v=3mHy4OSyRf0>`_
- `Tips for computational text analysis from Simon Brown at Berkeley <http://matrix.berkeley.edu/research/tips-computational-text-analysis>`_
- Lessons and materials for teaching text analysis from the `Programming Historian <https://programminghistorian.org/en/lessons/>`_
- An introduction `on how to use pre-trained vector embeddings <https://www.shanelynn.ie/word-embeddings-in-python-with-spacy-and-gensim/>`_
- For working with shorter texts, like one sentences, where possibly there is just one topic and LDA assumes multiple topics, there is this `GSDMM model <https://towardsdatascience.com/a-unique-approach-to-short-text-clustering-part-1-algorithmic-theory-4d4fad0882e1>`_ with the full pipeline explained `here <https://towardsdatascience.com/short-text-topic-modeling-70e50a57c883>`_
- Slide deck on `Computational Analysis of Political Texts <https://docs.google.com/presentation/d/1Pm2obVYPjruc-zR2URnNVd5ndtAek2wwPn4JpX-Svx8/edit>`_ from the Data and Web Science Group at the Universit of Mannheim

Newsletters
~~~~~~~~~~~

- `Sebastian Ruder's NLP News <http://newsletter.ruder.io/>`_, probably the most comprehensive newsletter out there, covering deep dives in the technical and the economics of mining.
- `The Gradient <https://thegradientpub.substack.com/>`_ are overviews, essays and perspectives on Artificial Intelligence, recent developments and long-term impacts. It is a publication ran by volunteers and open to submissions.


Other interesting resources
~~~~~~~~~~~~~~~~~~~~~~~~~~~

- An overview of text mining in the social sciences and humanities by Dong Nguyen, `arxiv preprint <https://arxiv.org/pdf/1907.01468.pdf>`_
- A `critique of computational text analysis <https://opinionator.blogs.nytimes.com/2012/01/23/mind-your-ps-and-bs-the-digital-humanities-and-interpretation/>`_ in the humanities
- An academic paper from the Workshop on Computational Humanities Research 2020, discussing the history of quantitative and computational research in the humanities, and especially the quantitative methods in history before computers; by `Michael Piotrowski and Mateusz Fafinsky <http://ceur-ws.org/Vol-2723/short16.pdf>`_
- Estimating the degree of similarity between two texts, `a blog by Adrien Sieg <https://medium.com/@adriensieg/text-similarities-da019229c894>`_ 2018
- `Masakhane <https://www.masakhane.io>`_ is a grassroots NLP community for Africans, by Africans. It brings people together to work on challenging research problems for African languages. Their recent EMNLP 2020 findings paper demonstrates the impact grassroots efforts can have. 
- A super long and comprehensive list of great NLP resources by `Keon <https://github.com/keon/awesome-nlp>`_