mrdavidkovacs
diff --git a/‎chapters/chapter01.tex‎
Lines changed: 26 additions & 25 deletions b/‎chapters/chapter01.tex‎
Lines changed: 26 additions & 25 deletions
diff --git a/‎chapters/chapter02.tex‎
Lines changed: 4 additions & 4 deletions b/‎chapters/chapter02.tex‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎chapters/chapter03.tex‎
Lines changed: 6 additions & 6 deletions b/‎chapters/chapter03.tex‎
Lines changed: 6 additions & 6 deletions
@@ -25,22 +25,22 @@
 \section{Motivation}
 \label{s:introduction-motivation}
 
-Many studies have been published which are trying to predict the stock market movement \citep[see][]{Bollen2011a,Mittal2012a,Nguyen2015a,Pagolu2016a,Zhang2011a}.
-As the \ac{EMH} states that financial market movements depend on news, current events and product releases and all these factors will have significant impact on a company's stock value
+Many studies have been published which try to predict the stock market movement \citep[see][]{Bollen2011a,Mittal2012a,Nguyen2015a,Pagolu2016a,Zhang2011a}.
+The \ac{EMH} states that financial market movements depend on news, current events and product releases and all these factors will have significant impact on a company's stock value
 \citep{fama1965behavior}.
-Due the fact that news and current events are unpredictable stock market prices are following a random walk pattern and cannot predicted with more than \SI{50}{\percent} accuracy
+Due to the fact that news and current events are unpredictable, stock market prices follow a random walk pattern and cannot be predicted with more than \SI{50}{\percent} accuracy
 \citep{Pagolu2016a}.
 
-\citet{Malkiel2003} noted that with the beginning of the new millennium financial economists believed that stock prices are at least partly predictable.
+\citet{Malkiel2003} noted that with the beginning of the new millennium, financial economists believed that stock prices are at least partly predictable.
 They emphasized the behavioral and psychological elements of stock price determination.
 
-Many internet users are microblogging nowadays.
-Millions of messages are published daily on popular websites which provides microblogging services, such as Twitter, Tumblr and Facebook.
-These published messages describing the personal life, opinions or current issues.
-The more users are post about products and services they use the more microblogging websites become a valuable source of peoples opinions and sentiments.
+Nowadays many internet users are microblogging.
+Millions of messages are published daily on popular websites which provide microblogging services, such as Twitter, Tumblr and Facebook.
+These published messages describe the personal life, opinions or current issues.
+The more users post about products and services they use, the more microblogging websites become a valuable source of peoples' opinions and sentiments.
 Therefore, this data can be used for marketing, social studies and as a measure of public opinion
 \citep{Patodkar2016a, Pagolu2016a}. 
-As most Twitter messages have a maximum length of 140 characters and speaks public opinion on a topic precisely
+Most Twitter messages have a maximum length of 140 characters and represents the public opinion on a precise topic
 \citep{Pagolu2016a}.
 
 Combining these two research fields (namely \ac{EMH} and Twitter) should enable us to investigate whether stock prices can be predicted via public opinions on Twitter.
@@ -51,7 +51,7 @@ \section{Research Goals}
 According to the factors presented in \cref{s:introduction-motivation} the central research question can be formulated:
 \emph{To what extent can stock market movements be explained by the public opinion extracted from Twitter?}
 
-The goal of this research to analyze the correlation between sentiment of tweets and share movement of automotive companies.
+The goal of this research is to analyze the correlation between the sentiment of tweets and the share movement of automotive companies.
 This goal will be met by achieving the following objectives:
 
 \begin{itemize}
@@ -60,23 +60,23 @@ \section{Research Goals}
     \item \textbf{G3} - Comparing sentiment time series with share prices
 \end{itemize}
 
-From definitions of goals and having the central question in mind the following sub tasks are defined in form of questions in order to fulfill the goals:
+Based on the definitions of goals and having the central question in mind, the following sub tasks are defined in the form of questions in order to fulfill the goals:
 
 \begin{itemize}
     \item \textbf{G1-Q1} - Which companies should be analyzed?
     \item \textbf{G1-Q2} - Which keywords should be used to find corresponding tweets?
     \item \textbf{G1-Q3} - Which company uses which stock symbol in order to retrieve share prices?
-    \item \textbf{G2-Q4} - Why Twitter and not anything else?
-    \item \textbf{G2-Q5} - In which way tweets can be collected?
-    \item \textbf{G2-Q6} - In which way sentiments can be determined?
+    \item \textbf{G2-Q4} - Why Twitter and not any other social media platform?
+    \item \textbf{G2-Q5} - In which way can tweets be collected?
+    \item \textbf{G2-Q6} - In which way can sentiments be determined?
     \item \textbf{G2-Q7} - Which sentiments are present for various companies?
 	\item \textbf{G3-Q8} - Can the time series of sentiments explain the share prices?
 \end{itemize}
 
 \section{Research Methodology}
 \label{s:introduction-researchmethodology}
 
-The research follows a structure deducted from ``evaluation techniques for systems analysis and design modelling methods'' by \citet{Siau2011} in which the authors try to show up the benefits and the shortcomings of different methods.
+The research follows a structure deducted from ``evaluation techniques for systems analysis and design modelling methods'' by \citet{Siau2011} in which the authors try to show the benefits and the shortcomings of different methods.
 In the following the three main categories and their mapping to this thesis are shown:
 
 \begin{description}
@@ -93,24 +93,25 @@ \section{Research Methodology}
 		is used to compare the results of the case study with share prices of the automotive companies.
 \end{description}
 
-As this thesis covers sentiments of people in a global context which is then compared to share prices in an economic context it can be classified as social science \citep{Recker2013}.
-In the following the research actions, which have been undertaken to answer the questions and fulfill the goals, are explained.
+As this thesis covers sentiments of people in a global context which are then compared to share prices in an economic context it can be classified as social science \citep{Recker2013}.
+In the following the research actions which will be undertaken to answer the questions and fulfill the goals are explained.
 
 \begin{itemize}
-	\item To find answers to the questions \textbf{Q1} to \textbf{Q5} literature research has been conducted.
-	A keyword search has been performed on the literature search-engine \emph{Google Scholar} as well as library search.
-	The retrieved literature is reviewed and based on the references new literature is obtained.
+	\item To find answers to the questions \textbf{Q1} to \textbf{Q5} literature research will be conducted.
+	A keyword search will be performed on the literature search engine \emph{Google Scholar}.
+	Furthermore, the library will be searched as well.
+	The retrieved literature is reviewed and, based on the references, new literature is obtained.
 
 	\item With the theoretical background which has been obtained in answering the questions \textbf{Q1} to \textbf{Q6} a tweet collection system has been set up in order to answer the question \textbf{Q7}.
-	This is done by setting up a open source tweet capturing system \ac{DMITCAT} system and evaluating the sentiment of the captured tweets.
+	This is done by setting up an open source tweet capturing system (\ac{DMITCAT}) and evaluating the sentiment of the captured tweets.
 
-	\item Question \textbf{Q8} is answered through both literature research, which has been collected for the questions \textbf{Q1} to \textbf{Q6} and evaluated sentiments of the collected tweets for question \textbf{Q7}.
+	\item Question \textbf{Q8} is answered through both literature research, which has been collected for the questions \textbf{Q1} to \textbf{Q6}, and evaluated sentiments of the collected tweets for question \textbf{Q7}.
 \end{itemize}
 
 \section{Structure of this Thesis}
 \label{s:introduction-structureofthisthesis}
 
 This section is followed by the background \cref{c:background}, where the necessary theoretical background will be explained. 
-In \cref{c:casestudy}, the setup of tweet collection is explained and the execution documented.
-Afterwards, in \cref{c:analysis}, sentiments of collected tweets are determined and converted into a time series which is then compared to the time series of share prices.
-Finally, in \cref{c:conclusion} the results of this work are summed up, and limitations and further points of interest are pointed out.
+In \cref{c:casestudy}, the setup of tweet collection will be explained and the execution documented.
+Afterwards, in \cref{c:analysis}, sentiments of collected tweets will be determined and converted into a time series which will then be compared to the time series of share prices.
+Finally, in \cref{c:conclusion} the results of this work will be summed up, and limitations and further points of interest will be pointed out.
@@ -9,8 +9,8 @@
 This section should provide the theoretical background and the foundation of the conducted study.
 Therefore, this section is structured as follows: 
 first, related work of stock market prediction will be presented in \cref{s:background-stockmarketprediction};
-second, an introduction into option mining will be given in \cref{s:background-optionmining};
-and third, the market of social networks will be examined in \cref{s:background-socialnetworks}.
+secondly, an introduction into option mining will be given in \cref{s:background-optionmining};
+and thirdly, the market of social networks will be examined in \cref{s:background-socialnetworks}.
 
 \section{Stock Market Prediction} 
 \label{s:background-stockmarketprediction}
@@ -40,7 +40,7 @@ \section{Stock Market Prediction}
 
 These effects can be also applied to the stock markets: not just news influences the stock market but also the public opinion and mood.
 Previously large surveys have been conducted to gather the public mood of a representative sample.
-This was very time consuming and expensive.
+This was very time-consuming and expensive.
 But in the last ten years a significant progress has been made in sentiment tracking techniques.
 Therefore the sentiments can be extracted from news and blogs
 \citep{Bollen2011a}.
@@ -80,7 +80,7 @@ \section{Option Mining}
 \end{enumerate}
 
 This study will focus on short documents with given keywords in it.
-Therefore, we assume that the documents describing our targeted topic (see \cref{s:background-socialnetworks} on page \pageref{s:background-socialnetworks} for the background).
+Therefore, it is assumed that the documents describe the targeted topic (see \cref{s:background-socialnetworks} on page \pageref{s:background-socialnetworks} for the background).
 As a result the study will focus on sentiment classification.
 
 Sentiment classification has some similarities with topic-based text classification, which classifies the topic of documents into predefined topic classes, for example sports, science or politics.
 
@@ -30,7 +30,7 @@ \section{Determine Companies, Keywords and Stock Symbols to Analyze}
 These companies must be traded on a stock exchange to perform the comparison with tweet sentiments.
 As a single company may own several car brands a list of all brands has been set up.
 The result of the analysis is depicted in \cref{tab:casestudy-brands}.
-Both brands which aren't customer facing passenger car brands and brands which do not longer exist have been omitted.
+Both brands which aren't customer-facing passenger car brands and brands which do not longer exist have been omitted.
 Furthermore, the brands have been grouped by their owning company.
 
 \begin{longtable}[c]{!l ^l}
@@ -139,7 +139,7 @@ \subsection{Gather Tweets}
 
 A large set of tweets is needed to perform the analysis within a time frame of at least one month.
 There were several approaches to get these tweets: download tweets directly or capture tweets within the given time frame.
-As we are tracking five companies using 23 keywords (brands) there will be a quite big amount of data.
+As the tracking includes five companies using 23 keywords (brands) there will be a quite big amount of data.
 
 Several approaches have been tried to get as many tweets as possible to the given keywords, including:
 
@@ -148,7 +148,7 @@ \subsection{Gather Tweets}
     was the first attempt.
     But there were very serious limitations to the official \ac{API} that made that quite easy way impossible.
     First, the standard search \ac{API} supports just a maximum count of 100 tweets;
-    second, it supports a history of only seven days;
+    secondly, it supports a history of only seven days;
     and lastly, there were to tight rate limits defined in order gather all possible tweets of the seven days period \citep{TwitterInc.2018}.
 
   \item [Twitter search on website]
@@ -190,7 +190,7 @@ \subsection{Gather Tweets}
     The storage was full after approximately 14 days of data collection.
     As the problem was not detected right away it took several days for identifying and fixing the issue.
 
-  \item The rate limits of the \ac{API} have been hit now and then in case too many tweets were published.
+  \item The rate limits of the \ac{API} were hit now and then in case too many tweets were published.
   \ac{DMITCAT} continued to collect tweets automatically after the corresponding time window.
 
   \item New releases \ac{DMITCAT} have been published from time to time which also required a database upgrade.
@@ -399,7 +399,7 @@ \section{Determine Sentiment of Tweets}
 \citep{buitinck2013api}.
 
 % Cross validation with GridSearchCV
-Furthermore \emph{scikit-learn} provides some helpers to find the best hyper-parameters for the given problem.
+Furthermore, \emph{scikit-learn} provides some helpers to find the best hyper-parameters for the given problem.
 The user can define which values various hyper-parameters can attain and the helper then perform test runs for various combinations, calculate their score and keep acting as the best performing model.
 Therefore, this type of search is called \emph{model selection}.
 \emph{Scikit-learn} provides two different model selection helpers: \emph{GridSearchCV} and \emph{RandomizedSearchCV}.
@@ -542,7 +542,7 @@ \section{Determine Sentiment of Tweets}
     The stock prices are already in a time series format on a daily basis except for weekends or holidays but the problem of missing entries are tackled later.
     First, the sentiment analysis result dataset must be condensed to form a time series for comparison.
     Therefore the results per tweet are grouped per day and summed up.
-    As negative sentiments have the value \texttt{'-1'} and positive sentiments have the value \texttt{'1'} we receive a number which is positive in case more positive than negative tweets have been published on that given day and vice versa. 
+    As negative sentiments have the value \texttt{'-1'} and positive sentiments have the value \texttt{'1'} a number is received which is positive in case more positive than negative tweets have been published on that given day and vice versa. 
 
     Missing stock prices have been calculated iteratively and the gaps have been filled by using the following procedure:
     Given $x$ is a stock price value and $y$ is the next present value with one or more values in between are missing.