banner
orion

orion

中国科学技术大学研究生;数据库内核开发工程师;生产力爱好者;

Implementing a simple SQLite from 0 to 1 (I. REPL)

Introduction#

REPL (Read-Eval-Print Loop) is an interactive programming environment that allows users to input code and see the results immediately. In this article, we will introduce how to write a simple REPL in Rust that can read user-inputted code and print the results in the console.

Implementation Steps#

This REPL is mainly divided into 3 steps: reading user input, parsing user input, and executing user input.

  1. Read User Input

    First, we need to loop to read user input. When the user presses the enter key, we can retrieve the content the user has entered in the console. If the user has not entered anything, we can continue to wait for user input.

  2. Parse User Input

    Once we have obtained the user's input, we need to parse it into a format that the program can understand. We first check if the user input is a meta-command (starting with "."), and if so, we execute the corresponding meta-command. If it is not a meta-command, we need to parse the SQL statement entered by the user.

  3. Execute User Input

    After we have parsed the SQL statement entered by the user, we can execute this statement. We first need to preprocess the input SQL statement and then execute it. If the execution is successful, we can print the corresponding result in the console. If the execution fails, we need to print the error message in the console.

The above is the basic flow of this REPL. If you want to know more details, please check the code.

Code#

main.rs#

mod executor;
mod parser;
use executor::{execute_statement, PrepareError, Statement};
use parser::{do_meta_command, prepare_statement, MetaCommandResult};
use std::io::{self, Write};

fn main() -> io::Result<()> {
    loop {
        print_prompt();
        io::stdout().flush()?;
        let mut cmd = String::new();
        io::stdin().read_line(&mut cmd)?;
        if cmd.trim().is_empty() {
            continue;
        }

        if cmd.starts_with('.') {
            match do_meta_command(cmd.trim()) {
                MetaCommandResult::Exit => {
                    println!("exit, bye");
                    break;
                }
                MetaCommandResult::UnrecognizedCommand => {
                    println!("Unrecognized command '{}'", cmd.trim());
                    continue;
                }
            };
        }

        let mut stmt: Statement = Statement::default();

        match prepare_statement(cmd.trim(), &mut stmt) {
            Ok(_) => {}

            Err(PrepareError::UnrecognizedStatement) => {
                println!("Unrecognized keyword at start of '{}'", cmd.trim());
                continue;
            }
        };

        match execute_statement(&stmt) {
            Ok(_) => {
                println!("Executed.");
            }
            Err(_) => {
                println!("Error executing statement");
                continue;
            }
        }
    }

    Ok(())
}

fn print_prompt() {
    print!("db > ")
}

The prepare_statement function is encapsulated in parser.rs because its role is to parse the SQL statement input by the user into a format that the program can understand, which falls under the category of syntax analysis. Therefore, it is more appropriate to place it in the parser.rs file; while the execute_statement function is responsible for executing the parsed statement, which needs to be handled by the executor, so it is encapsulated in the executor.rs file.

parser.rs#

use crate::executor::{PrepareError, Statement, StatementType};
pub enum MetaCommandResult {
    Exit,
    UnrecognizedCommand,
}

pub fn do_meta_command(cmd: &str) -> MetaCommandResult {
    if cmd == ".exit" {
        MetaCommandResult::Exit
    } else {
        MetaCommandResult::UnrecognizedCommand
    }
}

pub fn prepare_statement(cmd: &str, stmt: &mut Statement) -> Result<(), PrepareError> {
    if cmd.starts_with("insert") {
        stmt.statement_type = StatementType::Insert;
        // Parse cmd into row
        let mut iter = cmd.split_whitespace();
        // Check if there are three parameters
        if iter.clone().count() != 4 {
            return Err(PrepareError::UnrecognizedStatement);
        }
        iter.next();

        stmt.row_to_insert.id = iter.next().unwrap().parse::<u32>().unwrap();
        stmt.row_to_insert.username = iter.next().unwrap().to_string();
        stmt.row_to_insert.email = iter.next().unwrap().to_string();
        println!("stmt.row_to_insert: {:?}", stmt.row_to_insert);

        return Ok(());
    }
    if cmd.starts_with("select") {
        stmt.statement_type = StatementType::Select;
        return Ok(());
    }
    Err(PrepareError::UnrecognizedStatement)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_prepare_insert() {
        let mut stmt = Statement::default();
        let cmd = "insert 1 user1 [email protected]";
        let _result = prepare_statement(cmd, &mut stmt);
        if stmt.statement_type != StatementType::Insert {
            panic!("must be insert statement");
        }

        // Test if each field is correct
        assert_eq!(stmt.row_to_insert.id, 1);
        assert_eq!(stmt.row_to_insert.username, "user1");
        assert_eq!(stmt.row_to_insert.email, "[email protected]");
    }
}

This is the parser.rs file in the code, which contains functions for parsing user input and meta-commands. Meta-commands are commands that start with ".", used to control the behavior of the REPL, such as exiting the REPL.

The do_meta_command function is used to parse meta-commands. If the user inputs ".exit", the function returns MetaCommandResult::Exit, indicating that the REPL should exit; if the user inputs other meta-commands, the function returns MetaCommandResult::UnrecognizedCommand, indicating an unrecognized meta-command.

The prepare_statement function is used to parse the SQL statement input by the user into a format that the program can understand. If the user inputs "insert", the function parses the SQL statement into a Statement struct, which contains the fields of the row to be inserted. If the user inputs "select", the function parses the SQL statement into a Statement struct without any query conditions. If the user inputs other statements, the function returns PrepareError::UnrecognizedStatement, indicating an unrecognized statement.

The code in the prepare_statement function implements the logic for parsing "insert" statements. First, it sets the statement_type field of the Statement struct to StatementType::Insert, indicating that a row of data is to be inserted. Then, it parses the SQL statement into an iterator, using the split_whitespace function to split it into words. If the number of words is not 4, the function returns PrepareError::UnrecognizedStatement, indicating an unrecognized statement. Otherwise, the function assigns the parsed values to the row_to_insert field of the stmt struct, including the id, username, and email fields.

In the tests, we used test_prepare_insert to test whether the prepare_statement function can correctly parse "insert" statements and assign the parsed values to the row_to_insert field of the stmt struct.

executor.rs#

#![allow(dead_code)]

use serde::{Deserialize, Serialize};
use std::fmt::Display;

#[derive(Debug, Eq, PartialEq)]
pub enum StatementType {
    Insert,
    Select,
    Unrecognized,
}

#[derive(Debug, Eq, PartialEq)]
pub enum PrepareError {
    UnrecognizedStatement,
    IncorrectParamNumber,
}
impl Display for PrepareError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            PrepareError::UnrecognizedStatement => {
                write!(f, "UnrecognizedStatement")
            }
            PrepareError::IncorrectParamNumber => {
                write!(f, "IncorrectParamNumber")
            }
        }
    }
}

#[derive(Debug, Eq, PartialEq)]
pub enum ExecuteError {
    UnrecognizedStatement,
    Failure(String),
}

impl Display for ExecuteError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            ExecuteError::UnrecognizedStatement => {
                write!(f, "UnrecognizedStatement")
            }
            ExecuteError::Failure(s) => {
                write!(f, "Failure: {}", s)
            }
        }
    }
}

pub enum ExecuteResult {
    Record(Vec<Row>),
    Affected(u32),
}
pub struct Statement {
    pub row_to_insert: Row,
    pub statement_type: StatementType,
}

#[derive(Serialize, Deserialize, Default, Debug, Clone)]
pub struct Row {
    pub id: u32,
    pub email: String,
    pub username: String,
}
impl Default for Statement {
    fn default() -> Self {
        Statement {
            statement_type: StatementType::Unrecognized,
            row_to_insert: Row::default(),
        }
    }
}

pub fn execute_statement(stmt: &Statement) -> Result<ExecuteResult, ExecuteError> {
    match stmt.statement_type {
        StatementType::Insert => execute_insert(stmt),
        StatementType::Select => execute_select(stmt),
        StatementType::Unrecognized => Err(ExecuteError::UnrecognizedStatement),
    }
}

fn execute_insert(_stmt: &Statement) -> Result<ExecuteResult, ExecuteError> {
    println!("This is where we would do an insert.");
    Err(ExecuteError::Failure("unimplemented".to_owned()))
}

fn execute_select(_stmt: &Statement) -> Result<ExecuteResult, ExecuteError> {
    println!("This is where we would do a select.");
    Err(ExecuteError::Failure("unimplemented".to_owned()))
}

This is the executor.rs file in the code, which contains functions for executing user-inputted SQL statements. The execute_statement function calls the corresponding function to execute the statement based on the input statement type. If the statement type is "insert", the function calls execute_insert to perform the insert operation; if the statement type is "select", the function calls execute_select to perform the query operation; if the statement type is another value, the function returns ExecuteError::UnrecognizedStatement, indicating an unrecognized statement type.

The execute_insert and execute_select functions are both unimplemented placeholder functions that simply print a message and return an error.

The Row struct represents a row of data, containing the id, email, and username fields. The Statement struct represents a SQL statement, containing the type of statement to be executed and the row data to be inserted. The StatementType enum represents the types of SQL statements, including "insert" and "select". The PrepareError and ExecuteError enums represent possible errors that may occur during statement preparation and execution, including unrecognized statements and incorrect parameter counts.

In the implementation of the Display trait, we have implemented the fmt method for both PrepareError and ExecuteError to format the error messages.

Summary#

This article mainly introduces the basic flow and code implementation of REPL, including parsing user-inputted SQL statements and executing statements. The parser.rs file in the code contains functions for parsing user input and meta-commands, while the executor.rs file contains functions for executing user-inputted SQL statements.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.