chevron_left chevron_right
Login Register invert_colors photo_library
Thread Rating:
  • 0 Vote(s) - 0 Average

filter_list Creating the Next Cancer Emoji Language, Emotifuck (Basics of ProgLang design)
Creating the Next Cancer Emoji Language, Emotifuck (Basics of ProgLang design) #1
I recently re-created brainfuck in Rust with Emojis, and it was a pretty interesting and useful introduction to simple programming language design. It's not a super complex language -- it only has eight instructions and was accomplished in under 12 hours of work, and one weekend, so it's pretty easy to understand the source code.

Nevertheless, for what the language is, it's terribly over-engineered. However, I think this lends itself to a better example of how a 'real life' language may work.

For anyone that wants to jump straight to the source code, it's here: (Not on my github, created it with a friend).

This language is interpreted rather than compiled, and consists totally of Emojis. The README has a nice description of exactly what emojis do what.

There are two main parts to this language: The Parser, and an interpreter.

Inside the interpreter, are two functions. 'compile', and 'interpret'. The compile function manages the instructions and pushes them onto one of two Vectors (which are both going to be used as a stack). it 'turns' the instructions received from a file into 'machine readable' instructions (not actually, but it simulates this). Then, the interpreter actually 'executes' these instructions. The entire thing essentially acts as a type of Turing Machine. If you don't know what that is, I'd suggest watching a YouTube video or reading up, there's a ton of info on those things. But basically, depending on the instruction it may read some part of the stack, use it in another operation, or send it as 'output', accept it as input into memory, Memory, in this case, is simulated with the array 'data'.

source code for compile/interpret
* @author InsidiousMindWroteAlmostHalfOfThisAndDebugged
* Generate a compiled vector of intermediate code and run that sucker.

use std::io;
use std::io::Read;
use std::io::Write;

use core::types::Emotifuck;

pub struct Instruction {
    pub op_code: i32,
    pub operand: i32

pub struct State {
    pub program: Vec<Instruction>,
    pub stack: Vec<i32>,

const MOVR: i32 = 1;
const MOVL: i32 = 2;
const INC: i32 = 3;
const DEC: i32 = 4;
const JMP_F: i32 = 5;
const OUT: i32 = 6;
const IN: i32 = 7;
const JMP_BK: i32 = 8;
const DATA_SIZE: usize = 1024;

pub fn compile(instruction_vector: Vec<Emotifuck>) -> State {
    let mut program = Vec::new();
    let mut stack = Vec::new();
    let mut pc = 0;
    for i in instruction_vector {
        match i {
            Emotifuck::MoveRight => program.push(Instruction { op_code: MOVR, operand: 0 }),
            Emotifuck::MoveLeft => program.push(Instruction { op_code: MOVL, operand: 0 }),
            Emotifuck::Increment => program.push(Instruction { op_code: INC, operand: 0 }),
            Emotifuck::Decrement => program.push(Instruction { op_code: DEC, operand: 0 }),
            Emotifuck::JumpForward => {
                program.push(Instruction { op_code: JMP_F, operand: 0 });
            Emotifuck::Output => program.push(Instruction { op_code: OUT, operand: 0 }),
            Emotifuck::Input => program.push(Instruction { op_code: IN, operand: 0 }),
            Emotifuck::JumpBackward => {
                if let Some(jmp_pc) = stack.pop() {
                    program.push(Instruction { op_code: JMP_BK, operand: jmp_pc });
                    program[jmp_pc as usize].operand = pc;
                } else {
                    panic!("SOMETHING WENT WRONG!");
            _ => pc -= 1,
        pc += 1;
    program.push(Instruction { op_code: 0, operand: 0 });
    State {
        program: program,
        stack: stack,

pub fn interpret(state: State) {
    let mut pc: usize = 0;
    let mut ptr: usize = 0;
    let program = state.program.as_slice();
    let mut data = [0; DATA_SIZE];
    'prog: loop {
        if pc >= DATA_SIZE { break 'prog }
        match program[pc].op_code {
            0 => {break 'prog},
            MOVR => ptr += 1,
            MOVL => ptr -= 1,
            DEC => { data[ptr] -= 1 },
            INC => { data[ptr] += 1 },
            OUT => {
                io::stdout().write(&[data[ptr] as u8]);
            IN => data[ptr] = io::stdin()
                .and_then(|result| result.ok())
                .map(|byte| byte as i32)
            JMP_F => {
                if data[ptr] == 0 {
                    pc = program[pc].operand as usize;
            JMP_BK => {
                if data[ptr] != 0 {
                    pc = program[pc].operand as usize;
            _ => {}
        pc += 1;

I'm much more familiar with the parser, though.

The Parser (which is what I created) takes a file and translates it to a set of human-readable enums in Rust.

The PEG parser. A simple Parsing Expression Grammar to take the UTF-8 Emoji characters found in a file and convert them into an Enum which contains human-readable interpretations of the instructions (this is mostly for better coding practice and make it clearer what does what).

A PEG parser, for those who may not know, is just a set of rules which are created in order to recognize strings of characters. Most programming languages do not use a PEG grammar, since it's use is too constrained for most use-cases. For emotifuck, though, it worked just fine. If you are at all familiar with Regex, the rules that dictate this Parsing Expression Grammar are very similar, except that they are tightly integrated with the Rust ecosystem and follow PEG standards. Like Regex, there is a guarantee of running in O(n) time.

In CS terms, PEG is just a analytics formal grammar. If this sounds interesting read Chomsky's Hierarchy of Language. (This is more theory than hard-programming). TBH not too interested in the theory side of things, but programming shit like this is a good learning exp sometimes.

Here's the source for the grammar:

*@author InsidiousMind
/// everything except Unicode Emojis (Rustpeg doesn't include that in the . rule)
anything -> super::Emotifuck
  = . { super::Emotifuck::Nothing }

move_right -> super::Emotifuck
  = '\u{1F525}' { super::Emotifuck::MoveRight }

move_left -> super::Emotifuck
  = '\u{1F4AF}' { super::Emotifuck::MoveLeft }

decrement -> super::Emotifuck
  = '\u{1F4A9}' { super::Emotifuck::Decrement }

//     U+1F602
increment -> super::Emotifuck
  = '\u{1F602}' { super::Emotifuck::Increment }

output -> super::Emotifuck
  = '\u{1F49E}' { super::Emotifuck::Output }

input -> super::Emotifuck
  = '\u{1F64F}' { super::Emotifuck::Input }

jump_forward -> super::Emotifuck
  = '\u{1F31A}' { super::Emotifuck::JumpForward }

jump_backward -> super::Emotifuck
  = '\u{1F438}' { super::Emotifuck::JumpBackward }

content -> Vec<super::Emotifuck>
  = (move_left / move_right / decrement / increment / output / input / jump_forward / jump_backward / anything)+

Every rule is defined like so:
name -> whatRuleWillReturn
= matchForTheRule { PossibleRustExpressionToReturn}

The rules separated by / simply mean "Try this, if this does not match, try the next thing, if that doesn't match, try the next thing" and so on. the '*' character stands for '0 or more' and the + stands for '1 or more'.

All that's left to do is call 'content' and out pops a convenient Vector of all the instructions in the program. An arguably easier way to do this is just reading the file directly, but a PEG is more fun.

The hello world program:

and so a really shitty programming language was born
as an added +, it was written in totally memory-safe rust. So, by extension, Emotifuck is completely memory safe. (Take that C.)

i'll edit this in some time to expand on some aspects of the language, and make it more of a 'tutorial' but i gotta get back to my calculus hw. This is good for now, I think. If there's something you're particularly interested about, respond below and i'll make sure to give it some time in my expansion-explanation.
(This post was last modified: 10-12-2017, 05:34 AM by insidious.)
[Image: pBD38Xq.png]

[+] 1 user Likes insidious's post

RE: Creating the Next Cancer Emoji Language, Emotifuck (Basics of ProgLang design) #2
That's actually pretty funny, nice work. Speaking of language design, I was doing some poking around researching for my in-progress language and apparently Perl 6 has native support for grammars which I think is pretty cool. Maybe not practical for a full language since Perl is interpreted, but it's still neat and definitely a cool option for the text parsing jobs it's known for.
(This post was last modified: 10-12-2017, 04:49 AM by Inori.)


Users browsing this thread: 1 Guest(s)