Skip to content
This repository was archived by the owner on Jan 4, 2024. It is now read-only.

Commit 40f95f2

Browse files
committed
Fix spelling errors
1 parent 95ed721 commit 40f95f2

File tree

7 files changed

+115
-115
lines changed

7 files changed

+115
-115
lines changed

chapters/chapter01.tex

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -25,22 +25,22 @@
2525
\section{Motivation}
2626
\label{s:introduction-motivation}
2727

28-
Many studies have been published which are trying to predict the stock market movement \citep[see][]{Bollen2011a,Mittal2012a,Nguyen2015a,Pagolu2016a,Zhang2011a}.
29-
As the \ac{EMH} states that financial market movements depend on news, current events and product releases and all these factors will have significant impact on a company's stock value
28+
Many studies have been published which try to predict the stock market movement \citep[see][]{Bollen2011a,Mittal2012a,Nguyen2015a,Pagolu2016a,Zhang2011a}.
29+
The \ac{EMH} states that financial market movements depend on news, current events and product releases and all these factors will have significant impact on a company's stock value
3030
\citep{fama1965behavior}.
31-
Due the fact that news and current events are unpredictable stock market prices are following a random walk pattern and cannot predicted with more than \SI{50}{\percent} accuracy
31+
Due to the fact that news and current events are unpredictable, stock market prices follow a random walk pattern and cannot be predicted with more than \SI{50}{\percent} accuracy
3232
\citep{Pagolu2016a}.
3333

34-
\citet{Malkiel2003} noted that with the beginning of the new millennium financial economists believed that stock prices are at least partly predictable.
34+
\citet{Malkiel2003} noted that with the beginning of the new millennium, financial economists believed that stock prices are at least partly predictable.
3535
They emphasized the behavioral and psychological elements of stock price determination.
3636

37-
Many internet users are microblogging nowadays.
38-
Millions of messages are published daily on popular websites which provides microblogging services, such as Twitter, Tumblr and Facebook.
39-
These published messages describing the personal life, opinions or current issues.
40-
The more users are post about products and services they use the more microblogging websites become a valuable source of peoples opinions and sentiments.
37+
Nowadays many internet users are microblogging.
38+
Millions of messages are published daily on popular websites which provide microblogging services, such as Twitter, Tumblr and Facebook.
39+
These published messages describe the personal life, opinions or current issues.
40+
The more users post about products and services they use, the more microblogging websites become a valuable source of peoples' opinions and sentiments.
4141
Therefore, this data can be used for marketing, social studies and as a measure of public opinion
4242
\citep{Patodkar2016a, Pagolu2016a}.
43-
As most Twitter messages have a maximum length of 140 characters and speaks public opinion on a topic precisely
43+
Most Twitter messages have a maximum length of 140 characters and represents the public opinion on a precise topic
4444
\citep{Pagolu2016a}.
4545

4646
Combining these two research fields (namely \ac{EMH} and Twitter) should enable us to investigate whether stock prices can be predicted via public opinions on Twitter.
@@ -51,7 +51,7 @@ \section{Research Goals}
5151
According to the factors presented in \cref{s:introduction-motivation} the central research question can be formulated:
5252
\emph{To what extent can stock market movements be explained by the public opinion extracted from Twitter?}
5353

54-
The goal of this research to analyze the correlation between sentiment of tweets and share movement of automotive companies.
54+
The goal of this research is to analyze the correlation between the sentiment of tweets and the share movement of automotive companies.
5555
This goal will be met by achieving the following objectives:
5656

5757
\begin{itemize}
@@ -60,23 +60,23 @@ \section{Research Goals}
6060
\item \textbf{G3} - Comparing sentiment time series with share prices
6161
\end{itemize}
6262

63-
From definitions of goals and having the central question in mind the following sub tasks are defined in form of questions in order to fulfill the goals:
63+
Based on the definitions of goals and having the central question in mind, the following sub tasks are defined in the form of questions in order to fulfill the goals:
6464

6565
\begin{itemize}
6666
\item \textbf{G1-Q1} - Which companies should be analyzed?
6767
\item \textbf{G1-Q2} - Which keywords should be used to find corresponding tweets?
6868
\item \textbf{G1-Q3} - Which company uses which stock symbol in order to retrieve share prices?
69-
\item \textbf{G2-Q4} - Why Twitter and not anything else?
70-
\item \textbf{G2-Q5} - In which way tweets can be collected?
71-
\item \textbf{G2-Q6} - In which way sentiments can be determined?
69+
\item \textbf{G2-Q4} - Why Twitter and not any other social media platform?
70+
\item \textbf{G2-Q5} - In which way can tweets be collected?
71+
\item \textbf{G2-Q6} - In which way can sentiments be determined?
7272
\item \textbf{G2-Q7} - Which sentiments are present for various companies?
7373
\item \textbf{G3-Q8} - Can the time series of sentiments explain the share prices?
7474
\end{itemize}
7575

7676
\section{Research Methodology}
7777
\label{s:introduction-researchmethodology}
7878

79-
The research follows a structure deducted from ``evaluation techniques for systems analysis and design modelling methods'' by \citet{Siau2011} in which the authors try to show up the benefits and the shortcomings of different methods.
79+
The research follows a structure deducted from ``evaluation techniques for systems analysis and design modelling methods'' by \citet{Siau2011} in which the authors try to show the benefits and the shortcomings of different methods.
8080
In the following the three main categories and their mapping to this thesis are shown:
8181

8282
\begin{description}
@@ -93,24 +93,25 @@ \section{Research Methodology}
9393
is used to compare the results of the case study with share prices of the automotive companies.
9494
\end{description}
9595

96-
As this thesis covers sentiments of people in a global context which is then compared to share prices in an economic context it can be classified as social science \citep{Recker2013}.
97-
In the following the research actions, which have been undertaken to answer the questions and fulfill the goals, are explained.
96+
As this thesis covers sentiments of people in a global context which are then compared to share prices in an economic context it can be classified as social science \citep{Recker2013}.
97+
In the following the research actions which will be undertaken to answer the questions and fulfill the goals are explained.
9898

9999
\begin{itemize}
100-
\item To find answers to the questions \textbf{Q1} to \textbf{Q5} literature research has been conducted.
101-
A keyword search has been performed on the literature search-engine \emph{Google Scholar} as well as library search.
102-
The retrieved literature is reviewed and based on the references new literature is obtained.
100+
\item To find answers to the questions \textbf{Q1} to \textbf{Q5} literature research will be conducted.
101+
A keyword search will be performed on the literature search engine \emph{Google Scholar}.
102+
Furthermore, the library will be searched as well.
103+
The retrieved literature is reviewed and, based on the references, new literature is obtained.
103104

104105
\item With the theoretical background which has been obtained in answering the questions \textbf{Q1} to \textbf{Q6} a tweet collection system has been set up in order to answer the question \textbf{Q7}.
105-
This is done by setting up a open source tweet capturing system \ac{DMITCAT} system and evaluating the sentiment of the captured tweets.
106+
This is done by setting up an open source tweet capturing system (\ac{DMITCAT}) and evaluating the sentiment of the captured tweets.
106107

107-
\item Question \textbf{Q8} is answered through both literature research, which has been collected for the questions \textbf{Q1} to \textbf{Q6} and evaluated sentiments of the collected tweets for question \textbf{Q7}.
108+
\item Question \textbf{Q8} is answered through both literature research, which has been collected for the questions \textbf{Q1} to \textbf{Q6}, and evaluated sentiments of the collected tweets for question \textbf{Q7}.
108109
\end{itemize}
109110

110111
\section{Structure of this Thesis}
111112
\label{s:introduction-structureofthisthesis}
112113

113114
This section is followed by the background \cref{c:background}, where the necessary theoretical background will be explained.
114-
In \cref{c:casestudy}, the setup of tweet collection is explained and the execution documented.
115-
Afterwards, in \cref{c:analysis}, sentiments of collected tweets are determined and converted into a time series which is then compared to the time series of share prices.
116-
Finally, in \cref{c:conclusion} the results of this work are summed up, and limitations and further points of interest are pointed out.
115+
In \cref{c:casestudy}, the setup of tweet collection will be explained and the execution documented.
116+
Afterwards, in \cref{c:analysis}, sentiments of collected tweets will be determined and converted into a time series which will then be compared to the time series of share prices.
117+
Finally, in \cref{c:conclusion} the results of this work will be summed up, and limitations and further points of interest will be pointed out.

chapters/chapter02.tex

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@
99
This section should provide the theoretical background and the foundation of the conducted study.
1010
Therefore, this section is structured as follows:
1111
first, related work of stock market prediction will be presented in \cref{s:background-stockmarketprediction};
12-
second, an introduction into option mining will be given in \cref{s:background-optionmining};
13-
and third, the market of social networks will be examined in \cref{s:background-socialnetworks}.
12+
secondly, an introduction into option mining will be given in \cref{s:background-optionmining};
13+
and thirdly, the market of social networks will be examined in \cref{s:background-socialnetworks}.
1414

1515
\section{Stock Market Prediction}
1616
\label{s:background-stockmarketprediction}
@@ -40,7 +40,7 @@ \section{Stock Market Prediction}
4040

4141
These effects can be also applied to the stock markets: not just news influences the stock market but also the public opinion and mood.
4242
Previously large surveys have been conducted to gather the public mood of a representative sample.
43-
This was very time consuming and expensive.
43+
This was very time-consuming and expensive.
4444
But in the last ten years a significant progress has been made in sentiment tracking techniques.
4545
Therefore the sentiments can be extracted from news and blogs
4646
\citep{Bollen2011a}.
@@ -80,7 +80,7 @@ \section{Option Mining}
8080
\end{enumerate}
8181

8282
This study will focus on short documents with given keywords in it.
83-
Therefore, we assume that the documents describing our targeted topic (see \cref{s:background-socialnetworks} on page \pageref{s:background-socialnetworks} for the background).
83+
Therefore, it is assumed that the documents describe the targeted topic (see \cref{s:background-socialnetworks} on page \pageref{s:background-socialnetworks} for the background).
8484
As a result the study will focus on sentiment classification.
8585

8686
Sentiment classification has some similarities with topic-based text classification, which classifies the topic of documents into predefined topic classes, for example sports, science or politics.

chapters/chapter03.tex

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ \section{Determine Companies, Keywords and Stock Symbols to Analyze}
3030
These companies must be traded on a stock exchange to perform the comparison with tweet sentiments.
3131
As a single company may own several car brands a list of all brands has been set up.
3232
The result of the analysis is depicted in \cref{tab:casestudy-brands}.
33-
Both brands which aren't customer facing passenger car brands and brands which do not longer exist have been omitted.
33+
Both brands which aren't customer-facing passenger car brands and brands which do not longer exist have been omitted.
3434
Furthermore, the brands have been grouped by their owning company.
3535

3636
\begin{longtable}[c]{!l ^l}
@@ -139,7 +139,7 @@ \subsection{Gather Tweets}
139139

140140
A large set of tweets is needed to perform the analysis within a time frame of at least one month.
141141
There were several approaches to get these tweets: download tweets directly or capture tweets within the given time frame.
142-
As we are tracking five companies using 23 keywords (brands) there will be a quite big amount of data.
142+
As the tracking includes five companies using 23 keywords (brands) there will be a quite big amount of data.
143143

144144
Several approaches have been tried to get as many tweets as possible to the given keywords, including:
145145

@@ -148,7 +148,7 @@ \subsection{Gather Tweets}
148148
was the first attempt.
149149
But there were very serious limitations to the official \ac{API} that made that quite easy way impossible.
150150
First, the standard search \ac{API} supports just a maximum count of 100 tweets;
151-
second, it supports a history of only seven days;
151+
secondly, it supports a history of only seven days;
152152
and lastly, there were to tight rate limits defined in order gather all possible tweets of the seven days period \citep{TwitterInc.2018}.
153153

154154
\item [Twitter search on website]
@@ -190,7 +190,7 @@ \subsection{Gather Tweets}
190190
The storage was full after approximately 14 days of data collection.
191191
As the problem was not detected right away it took several days for identifying and fixing the issue.
192192

193-
\item The rate limits of the \ac{API} have been hit now and then in case too many tweets were published.
193+
\item The rate limits of the \ac{API} were hit now and then in case too many tweets were published.
194194
\ac{DMITCAT} continued to collect tweets automatically after the corresponding time window.
195195

196196
\item New releases \ac{DMITCAT} have been published from time to time which also required a database upgrade.
@@ -399,7 +399,7 @@ \section{Determine Sentiment of Tweets}
399399
\citep{buitinck2013api}.
400400

401401
% Cross validation with GridSearchCV
402-
Furthermore \emph{scikit-learn} provides some helpers to find the best hyper-parameters for the given problem.
402+
Furthermore, \emph{scikit-learn} provides some helpers to find the best hyper-parameters for the given problem.
403403
The user can define which values various hyper-parameters can attain and the helper then perform test runs for various combinations, calculate their score and keep acting as the best performing model.
404404
Therefore, this type of search is called \emph{model selection}.
405405
\emph{Scikit-learn} provides two different model selection helpers: \emph{GridSearchCV} and \emph{RandomizedSearchCV}.
@@ -542,7 +542,7 @@ \section{Determine Sentiment of Tweets}
542542
The stock prices are already in a time series format on a daily basis except for weekends or holidays but the problem of missing entries are tackled later.
543543
First, the sentiment analysis result dataset must be condensed to form a time series for comparison.
544544
Therefore the results per tweet are grouped per day and summed up.
545-
As negative sentiments have the value \texttt{'-1'} and positive sentiments have the value \texttt{'1'} we receive a number which is positive in case more positive than negative tweets have been published on that given day and vice versa.
545+
As negative sentiments have the value \texttt{'-1'} and positive sentiments have the value \texttt{'1'} a number is received which is positive in case more positive than negative tweets have been published on that given day and vice versa.
546546

547547
Missing stock prices have been calculated iteratively and the gaps have been filled by using the following procedure:
548548
Given $x$ is a stock price value and $y$ is the next present value with one or more values in between are missing.

0 commit comments

Comments
 (0)