Source Code Crawler

Source Code Crawler is a demo application to test concurrency. Crawler walks project directory and builds inverted index of superclasses to it's descendants.

It includes following modules:

Step0 - counts word occurrences in file

Step1 - single threaded version

Step2 - each file is indexed in separate thread, each thread prints it's local index

Step3 - each file is indexed in separate thread, shared index is guarded by lock

Step4 - ExecutorService is used

Step5 - map and reduce threads are separated by queues

Step6 - Apache Hadoop MapReduce library is used

Step7 - Akka library is used

Step8 - parallel stream from Java Stream API is used

Step9 - Apache Spark is used

Build

./gradlew clean build

Step 6, Step 9

Download and unzip Apache Hadoop to your home directory.

Add hadoop bin directory to PATH.

In Step6 (or Step9) project directory run following commands:

hdfs dfs -mkdir input
hdfs dfs -put {PATH_TO_PROJECT}/source-code-crawler-step1/src/main/java/org/interactiverobotics/source_code_crawler/step1/dummy/*.java input

input directory has to be created in step's directory.

Run

./gradlew :source-code-crawler-step1:run

where source-code-crawler-step1 is step module name.

./gradlew :source-code-crawler-step6:run -PprogramArgs=input,output

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
gradle/wrapper		gradle/wrapper
source-code-crawler-common		source-code-crawler-common
source-code-crawler-step0		source-code-crawler-step0
source-code-crawler-step1		source-code-crawler-step1
source-code-crawler-step2		source-code-crawler-step2
source-code-crawler-step3		source-code-crawler-step3
source-code-crawler-step4		source-code-crawler-step4
source-code-crawler-step5		source-code-crawler-step5
source-code-crawler-step6		source-code-crawler-step6
source-code-crawler-step7		source-code-crawler-step7
source-code-crawler-step8		source-code-crawler-step8
source-code-crawler-step9		source-code-crawler-step9
.gitignore		.gitignore
COPYING		COPYING
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Source Code Crawler

Build

Step 6, Step 9

Run

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Source Code Crawler

Build

Step 6, Step 9

Run

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages