Overview
YourLang is a dynamically-typed, interpreted language built entirely from scratch. This documentation covers both the language itself and the internals of the interpreter — useful for anyone wanting to understand the implementation.
this language is based on the java as the core and Aot for executable generation ,Ast based compliation.The primary objective of the project is to learning the basic of the working of the interperter?]
Installation
# TODO: your install steps
Dowload the direct release of Machine exectuable file from release
https://github.com/Sunny-esc/Sun-Lang/releases
# TODO: Running from the source itself steps
git clone https://github.com/Sunny-esc/Sun-Lang
cd Sun-Lang && cd Sun
java Lox.java
Hello, World
// hello world in Sun language
print"Hello, world!";Basics
The basic syntax of Sun includes comments, whitespace, identifiers, and literals. Whitespace is ignored and used only for readability. Comments help document code, and literals represent fixed values such as numbers, strings, and booleans.
// Single-line comment
/* Block comment */
// Variable declarations with literals
var x = 42; // number
var name = "YourLang"; // string
var flag = true; // boolean
var nothing = nil; // null value
Expressions
Expressions produce values.Sun supports arithmetic, comparison, and logical expressions to perform calculations and make decisions in programs.
Arithmetic expressions
1 + 2; // 3
5 - 3; // 2
4 * 2; // 8
8 / 2; // 4
3 < 5; // true
5 <= 5; // true
7 > 2; // true
4 >= 6; // false
1 == 2; // false
"cat" != "dog"; // true
!true; // false
!false; // true
true and false; // false
true or false; // true
Statements
Statements perform actions in a program. In Sun, common statements include variable declarations, printing values, expressions, and block statements for grouping multiple operations together.
// Variable declaration
var x = 10;
var name = "Sunny";
// Print statement
print x;
print name;
// Expression statement
x = x + 5;
// Block statement (scope)
{
var y = 20;
print y;
}
// Conditional statement
if (x > 10) {
print "x is greater than 10";
} else {
print "x is small";
}
Functions
Functions allow you to group reusable logic. In Sun, functions are declared using the
fun keyword. They can take parameters, perform operations, and optionally return values.
// Function without return value
fun printSum(a, b) {
print a + b;
}
// Function with return value
fun returnSum(a, b) {
return a + b;
}
// Calling functions
printSum(1, 2);
var result = returnSum(2, 3);
print result;
Control Flow
if (x > 0) {
// TODO
} else {
// TODO
}
while (x < 10) {
x = x + 1;
}
for (var a = 1; a < 10; a = a + 1) {
print a;
}
Class
A class defines a blueprint for creating objects by bundling behavior (methods) and state (fields). In Sun, classes are first-class values, meaning they can be stored in variables, passed to functions, and invoked like functions to create instances.
class Breakfast {
cook() {
print "Eggs a-fryin'!";
}
serve(who) {
print "Enjoy your breakfast, " + who + ".";
}
}
// Classes are first-class values
var someVariable = Breakfast;
someFunction(Breakfast);
// Creating an instance
var breakfast = Breakfast();
print breakfast; // "Breakfast instance".
// Adding fields dynamically
breakfast.meat = "sausage";
breakfast.bread = "sourdough";
// Using 'this' inside methods
class Breakfast {
serve(who) {
print "Enjoy your " + this.meat + " and " +
this.bread + ", " + who + ".";
}
}
// Initializer (constructor)
class Breakfast {
init(meat, bread) {
this.meat = meat;
this.bread = bread;
}
}
var baconAndToast = Breakfast("bacon", "toast");
baconAndToast.serve("Dear Reader");
// "Enjoy your bacon and toast, Dear Reader."
Inheritance
Inheritance allows a class to reuse behavior from another class. In Sun, a subclass is defined using the < operator, where the subclass inherits all methods from its superclass. This enables code reuse and extension of existing behavior.
class Brunch < Breakfast {
drink() {
print "How about a Bloody Mary?";
}
}
// Creating an instance of subclass
var benedict = Brunch("ham", "English muffin");
benedict.serve("Noble Reader");
// Using super to call superclass methods
class Brunch < Breakfast {
init(meat, bread, drink) {
super.init(meat, bread);
this.drink = drink;
}
}
Ambiguity & Expression Parsing
Ambiguity in Parsing
When parsing expressions, ambiguity arises when a single sequence of tokens can be interpreted in multiple
ways.
The parser’s role is not only to validate syntax but also to determine how different parts of the input
relate to
the grammar.
Without clear rules, the parser may construct different syntax trees for the same expression, leading to
different
evaluation results.
Operator Precedence
Precedence defines the order in which different operators are evaluated. Operators with higher precedence
are
evaluated before those with lower precedence—they “bind tighter” to their operands.
For example, in an expression combining division and subtraction, division is evaluated first due to its
higher
precedence.
Associativity
Associativity determines evaluation order when multiple operators of the same type appear in sequence.
- Left-associative: evaluation proceeds from left to right
- Right-associative: evaluation proceeds from right to left
5 - 3 - 1
is interpreted as:
(5 - 3) - 1
Without well-defined precedence and associativity rules, expressions become ambiguous and unreliable.
Operator Hierarchy
| Name | Operators | Associativity |
|---|---|---|
| Equality | == != | Left |
| Comparison | > >= < <= | Left |
| Term | - + | Left |
| Factor | / * | Left |
| Unary | ! - | Right |
Recursive Descent Parsing
There are many parsing techniques—such as LL, LR, LALR, parser combinators, and others—but for this
interpreter,
a simpler and highly effective approach is used: recursive descent parsing.
Recursive descent is a top-down parsing technique. It starts from the highest-level
grammar rule
(typically expression) and progressively breaks it down into smaller sub-expressions until
reaching
the most basic elements of the syntax tree.
This method relies on straightforward, handwritten code instead of parser generators like Yacc, Bison, or
ANTLR.
Despite its simplicity, recursive descent is:
- Efficient and fast
- Easy to understand and maintain
- Capable of handling complex grammar structures
- Well-suited for detailed error reporting
Syntax Errors & Recovery
Role of the Parser
A parser has two primary responsibilities:
- Generate a syntax tree for valid input
- Detect and report errors for invalid input
Error Handling Requirements
A well-designed parser should:
- Detect and clearly report syntax errors
- Avoid crashing or entering infinite loops
- Continue parsing after encountering errors when possible
- Report multiple errors in a single pass
- Minimize cascading errors caused by earlier failures
Error Recovery
Error recovery is the mechanism that allows the parser to continue processing after encountering an error.
Panic Mode Recovery
In panic mode, the parser immediately stops processing the current construct when an error is detected. It then skips tokens until it reaches a point where parsing can safely resume. This process is called synchronization.
Entering Panic Mode
For example, while parsing a parenthesized expression, if the parser fails to find the expected closing
), it triggers an error and enters panic mode.
Synchronization in Recursive Descent
In recursive descent parsing, the parser’s state is implicitly stored in the call stack. Each active grammar rule corresponds to a function call. To recover from an error:
- The parser unwinds the call stack
- Skips tokens until a safe synchronization point is found
- Resumes parsing from a stable state
Statements & Expressions
Expression Statements
An expression statement allows an expression to appear where a statement is expected. These are primarily
used
when evaluating expressions that produce side effects, such as function calls.
someFunction();
Print Statements
A
print statement evaluates an expression and displays its result to the user.
print 2 + 1;
Grammar
program → statement* EOF ;
statement → exprStmt
| printStmt ;
exprStmt → expression ";" ;
printStmt → "print" expression ";" ;
Expressions vs Statements
- Expressions produce values and can be nested.
- Statements perform actions and control execution.
Statement Syntax Trees
Expressions and statements are represented using separate class hierarchies in the AST.
Expr→ represents expressions (value-producing)Stmt→ represents statements (execution-oriented)
- Type safety (compile-time validation)
- Code clarity and maintainability
- Clear separation of responsibilities
Base Class
abstract class Stmt {}
The AST generator is extended to include statements, producing specific subclasses such as print and
expression statements.
Variables & Declarations
Variable Declaration
A variable declaration introduces a new binding between a name and a value.
var beverage = "espresso";
Variable Access
A variable expression retrieves the value associated with a name.
print beverage;
Grammar
program → declaration* EOF ;
declaration → varDecl
| statement ;
varDecl → "var" IDENTIFIER ( "=" expression )? ";" ;
primary → IDENTIFIER | ... ;
If no initializer is provided, the variable is assigned a default value (
nil).
Accessing a variable before it is defined results in a runtime error.
Environment & Variable Storage
Variable bindings are stored in a structure called an environment. Internally, this behaves like a map:
- Keys → variable names
- Values → runtime values
private Environment environment = new Environment();
Operations supported:
- Define a variable
- Retrieve a variable’s value
- Update an existing variable
Assignment
Assignment allows updating the value of an existing variable.
a = 2;
In this language, assignment is an expression, not a statement. It has the lowest precedence and is right-associative.
expression → assignment ;
assignment → IDENTIFIER "=" assignment
| equality ;
Key Concept
- l-value: location being assigned to
- r-value: value being assigned
print a = 2; // prints 2
Scope & Block Execution
A scope defines where a variable is accessible.
This language uses lexical scope, meaning variable resolution is determined by the
structure of the
code.
Block Scope
{
var a = "inside";
}
print a; // Error
Variables declared inside a block are only accessible within that block.
Nested Scope & Shadowing
var a = "global";
{
var a = "local";
print a; // local
}
print a; // global
Inner variables can shadow outer variables without modifying them.
Environment Chaining
Each block creates a new environment linked to its enclosing one. Variable lookup proceeds from:
- Current (innermost) scope
- Outward through enclosing scopes
Block Grammar
statement → exprStmt
| printStmt
| block ;
block → "{" declaration* "}" ;
Control Flow Overview
Any sufficiently expressive programming language is capable of performing arbitrary computation. This idea
is
formalized through models like Turing machines and lambda calculus.
In practice, control flow determines how a program executes. It can be broadly divided into:
- Conditional flow: executes code selectively
- Looping flow: repeats execution of code
Conditional Execution
The if statement enables conditional execution based on a boolean expression.
statement → exprStmt
| ifStmt
| printStmt
| block ;
ifStmt → "if" "(" expression ")" statement
( "else" statement )? ;
Behavior
- If the condition is truthy → execute the first statement
- If falsey and
elseexists → execute the alternative statement
Dangling Else Problem
When nested
if statements omit braces, it can be unclear which if an
else
belongs to. This ambiguity is resolved by associating the else with the nearest preceding
if.
Logical Operators
Logical operators and and or are also control flow constructs, as they determine
whether
expressions are evaluated.
expression → assignment ;
assignment → IDENTIFIER "=" assignment
| logic_or ;
logic_or → logic_and ( "or" logic_and )* ;
logic_and → equality ( "and" equality )* ;
These operators typically use short-circuit evaluation:
orstops when a truthy value is foundandstops when a falsey value is found
While Loop
The while loop repeatedly executes a statement as long as its condition remains truthy.
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| block ;
whileStmt → "while" "(" expression ")" statement ;
Behavior
- Evaluate condition
- If truthy → execute body
- Repeat until condition becomes false
For Loop
The for loop provides a compact way to write iteration logic.
forStmt → "for" "(" ( varDecl | exprStmt | ";" )
expression? ";"
expression? ")" statement ;
A
for loop consists of three parts:
- Initializer: runs once before the loop starts
- Condition: checked before each iteration
- Increment: executed after each iteration
Example:
for (var i = 0; i < 10; i = i + 1) print i;
Internally, a
for loop can be transformed into an equivalent while loop, making it
a
higher-level construct built on top of simpler control flow.
Function Calls
A function call evaluates a callee expression and invokes it with arguments.
callee(arguments);
The callee is not limited to identifiers—it can be any expression that evaluates to a callable object.
Grammar
unary → ( "!" | "-" ) unary | call ;
call → primary ( "(" arguments? ")" )* ;
This allows chained calls such as:
fn(1)(2)(3);
Evaluation Process
- Evaluate the callee expression
- Evaluate each argument (left to right)
- Invoke the callable with evaluated arguments
Callable objects implement a common interface (e.g.,
SunCallable) which defines how calls are
executed.
:contentReference[oaicite:0]{index=0}
Call Errors & Arity
Function calls must be validated before execution.
Invalid Call Targets
If the callee is not callable, a runtime error is raised:
"not a function"();
Arity Checking
Each function defines an arity (number of expected arguments).
fun add(a, b, c) {
print a + b + c;
}
Calling with incorrect argument count results in an error:
add(1, 2); // too few
add(1, 2, 3, 4); // too many
The interpreter validates argument count before invocation to ensure correctness.
Native Functions
Native functions are implemented in the host language but exposed to user programs. They are useful for:
- Accessing system features (time, IO, etc.)
- Providing built-in functionality
Example:
clock();
This function returns the current time, allowing programs to measure execution duration.
Native functions are part of the runtime and are typically registered in the global environment.
Function Declarations
Functions are declared using the fun keyword.
declaration → funDecl
| varDecl
| statement ;
funDecl → "fun" function ;
function → IDENTIFIER "(" parameters? ")" block ;
A function declaration:
- Binds a name to a callable object
- Defines parameters and a body
Return Statements
A return statement exits a function and optionally provides a value.
return expression;
If no value is provided, the function returns nil.
Execution Behavior
Return statements may appear inside nested constructs, but they must immediately exit the entire function. To implement this, the interpreter uses a controlled mechanism (such as exceptions) to unwind execution until the function boundary is reached.
This ensures:
- Immediate function exit
- Correct value propagation
- Consistent execution behavior
Evaluate expressions
To evaluate expressions, the interpreter needs executable logic associated with each type of syntax node.
One
possible design is to embed this logic directly into the syntax tree classes using a method like
interpret(), allowing each node to evaluate itself.
This approach is similar to how the AstPrinter works. That class traverses the syntax tree
recursively
and builds a string representation. An interpreter follows the same traversal pattern, but instead of
producing
strings, it computes and returns runtime values.
Evaluating Literals
Literals are the most basic building blocks of expressions. They represent fixed values written directly in
the
source code.
- A literal is a piece of syntax that produces a value.
- It always originates from the user’s source code.
- It belongs to the parser’s domain, not the runtime.
Evaluating Parentheses (Grouping)
Grouping expressions—created using parentheses—are evaluated by first evaluating the enclosed expression and then returning its result. The grouping itself does not introduce new computation; it only controls evaluation order.
Evaluating Unary Expressions
Unary expressions operate on a single operand. The interpreter first evaluates the operand and then applies the operator. This evaluation follows a post-order traversal:
- First evaluate child expressions
- Then apply the operator at the current node
Runtime error logic
Runtime Errors
During evaluation, expressions may produce values of unexpected types. For example, attempting to perform
numeric
operations on a string leads to invalid behavior.
In a naive implementation, such mismatches result in runtime crashes—for example, a
ClassCastException in Java—which terminates the interpreter and prints an internal stack trace.
This
is undesirable for a user-facing language.
Consider the expression:
2 * (3 / -"muffin")
The unary - operator cannot be applied to a string. This error occurs deep within the expression,
making the entire computation invalid. As a result:
- The unary operation fails
- The division cannot proceed
- The multiplication also becomes invalid
- Stop evaluation of the current expression
- Report a meaningful error to the user
- Allow the interpreter to continue running
Detecting Runtime Errors
Since the interpreter evaluates expressions recursively, an error occurring deep in the evaluation stack must propagate outward. The preferred approach is to use controlled exception handling:
- Throw a custom runtime error when an invalid operation is detected
- Include source-level information (such as the token) in the error
- Catch the error at a higher level to prevent interpreter termination
Scope & Binding
Environment Chain
// TODO: pseudocode or real code showing your env structure