.. Arabica documentation master file, created by sphinx-quickstart on Sat Nov 19 22:00:38 2022. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to Arabica's documentation! =================================== **Arabica** is a python library for exploratory data analysis specifically designed for time-series text data. It reflects the reality that many text datasets are now collected as repeated observations over time (social media conversations, research metadata, product reviews, newspaper headlines, central bankers' communication, etc.). * **Descriptive n-gram analysis**: n-gram frequencies * **Time-series n-gram analysis**: n-gram frequencies over a period * **Text visualization**: n-gram heatmap, line plot, word cloud * **Sentiment analysis**: VADER sentiment classifier * **Financial sentiment analysis**: with FinVADER * **Structural breaks identification**: Jenks Optimisation Method *N-grams* are continuous sequences of words in a document. Technically, they are the neighboring sequences of items in a text. Some examples include: * **unigram**: "dog", **bigram**: "dog, goes", **trigram**: "dog, goes, home" * **unigram**: "flower", **bigram**: "flower, grows", **trigram**: "flower, grows, here" Contents -------- .. toctree:: Installation Library architecture Descriptive n-gram analysis Time-series n-gram analysis Time-series text visualization Word cloud Heatmap Line plot Sentiment analysis Structural break analysis Breakpoint identification Use cases and tutorials ------ I have created this project in my free time, and I hope Arabica will save you some time. You can `invite me for coffe `_ if Arabica helps you with your project, thesis, or research paper.