You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Report/report.tex
+6-1Lines changed: 6 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -232,6 +232,8 @@ \subsection{Creating Training and Testing Corpus}
232
232
\noalign{\smallskip}\hline
233
233
\end{tabular}
234
234
\end{table}
235
+
\subsection{LDA Model Development}
236
+
The LDA\cite{lda} model is present in \textbf{gensim package} in python. The inbuilt library method was not so straightforward and required a \textbf{vectorized bag of words} corpus as an input. It also required a dictionary developed from the available training corpora. The parameters that could be tweaked while developing the model were the corpora size and the total number of topics we want to extract. We chose the \textbf{total topics as 7}. In a normal vectorized corpus, the dimensionality would have been the entire size of the disctionary, which is very huge. Selecting the total topics essentially will reduce the dimensionality of our training corpora to merely 7 selected topics. The topic probability distribution dataset was used as a feature to create new training corpora which was used to train off the shelf classifiers such as \textit{MultinomialNaiveBayes, LogisticRegression, RandomForestClassifier, AdaBoostClassifier}. The performance of the models is discussed in a separate section on Model Evaluation.
235
237
236
238
\newpage
237
239
\begin{thebibliography}{}
@@ -241,7 +243,10 @@ \subsection{Creating Training and Testing Corpus}
241
243
\bibitem{yelp_dataset_challenge}
242
244
ggplot Library is used in this assignment to plot most of the graphs in this assgnment \url{http://ggplot2.org/}.
243
245
244
-
\bibitem{nltk}
246
+
\bibitem{nltk}
247
+
ggplot Library is used in this assignment to plot most of the graphs in this assgnment \url{http://ggplot2.org/}.
248
+
249
+
\bibitem{lda}
245
250
ggplot Library is used in this assignment to plot most of the graphs in this assgnment \url{http://ggplot2.org/}.
0 commit comments