Simplified C Scanner

This is a scanner/lexical analyzer for the simplified C language written in C++.

NFA Design

The NFA was designed taking into consideration of tokens and language of the simplified C language.

This is the rough NFA that detects for tokens for the simplified C language. Do note that the NFA is not implemented/present in code. It is only used to guide the creation of the DFA which is used.

DFA Design

The DFA was designed after getting the rough NFA.

This is what the rough DFA looks like. This DFA is what is implicitly hardcoded into the scanner.

Scanner Design

A class is used to encapsulate the scanner. It takes the address of a vector in its construction and after scanning, will fill the vector with tokens from the input file.

class Scanner {
    private:
        std::ifstream in_file; // input file
        std::vector<Token> *out_vector; // pointer to output vector;
        char cur_char; // current character pointer
        std::string char_buffer; // character buffer to analyze tokens

        // Member functions
        Token read_number();
        Token read_string();
        Token read_special_symbol();

    public:
        // Ban copying and assignment
        Scanner(Scanner const &) = delete;
        Scanner & operator=(Scanner const &) = delete;

        // Get scans file and puts token into tokens vector
        void scan();

        // Prints out token vector
        void print();
        void print(std::string out_file);

        // Constructor
        Scanner(std::string file_name, std::vector<Token> *out_vector);

        // Destructor
        ~Scanner();
};

The scanner checks character by character following the rough DFA outline before. For read_string(), the scanner will append the read characters into a string buffer first until the next character breaks the grammar for an ID token (does not have a leaving edge from state 1). Then, the scanner will check the scanned string buffer against the reserved words to determine if it is an ID token or a reserved word. A similar mechanism follows for multi-letter special symbols.

Running

Please run make all or make to build and compile the scanner. To run use ./scanner <input_file> as the running format. The scanner can accept multiple input files and will print out the result as an output file follow the naming convention ans1.c1 in the output/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.devcontainer		.devcontainer
reference		reference
testcases		testcases
util		util
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
check_ans.py		check_ans.py
main.cpp		main.cpp
run_scanner.sh		run_scanner.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simplified C Scanner

NFA Design

DFA Design

Scanner Design

Running

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Simplified C Scanner

NFA Design

DFA Design

Scanner Design

Running

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages