Part 3 of kscript (Basic Functions, generics): Writing a dynamic, interpreted, duck-typed language

 

This is part 3 of my series on kscript.

In this installment, we’ll be implementing functions that work on objects, and (can) return a value

Alright, so now that we have objects we want to do things with them. In programming, “doing something” is a function, and that function might result in some other value (i.e. return a value). Or, it can just do something, and return nothing.

We’re going to adopt the standard python does, and when nothing is returned, we return a none literal.

Now, we haven’t talked at all about parsing, grammar, tokenizing, etc, so we’re not yet talking about writing a function in kscript, but rather, we will be writing functions in C that we can call from kscript eventually. In fact, I believe most of the standard library will be written in C, much like Python. This way, we get more performance, better optimization, etc.

For right now, we’ll just implement the functions in the kscript.c file. Eventually, we’ll have a modular system for importing .so’s at runtime so we can load any kind of function/library!

So, the most important basic functions should be print, repr, and add. repr just means “representation”, so it takes a value and returns a string representation of it. This will be used internally by print when it is given something that is not a string.

Functions can also take variable number of arguments, and I think the easiest way to do that is for each function to accept a list of arguments, and a length. I’ll show you what I’m talking about, here’s the definition of repr:

// src/kscript.c

// return (string) representation of a single argument
ks_obj ks_std_repr(int args_n, ks_obj* args) {
    if (args_n != 1) {
        ks_error("repr takes %d args, was given %d", 1, args_n);
        return ks_obj_new_none();
    }

    // get the only argument
    ks_obj A = args[0];
    // the result we'll return
    ks_obj ret = ks_obj_new_str(KS_STR_EMPTY);

    // check type
    if (A->type == KS_TYPE_NONE) {
        ks_str_copy(&ret->_str, KS_STR_CONST("NONE"));
    } else if (A->type == KS_TYPE_INT) {
        char tmp[100];
        sprintf(tmp, "%ld", A->_int);
        ks_str_copy_cp(&ret->_str, tmp, strlen(tmp));
    } else if (A->type == KS_TYPE_FLOAT) {
        char tmp[100];
        sprintf(tmp, "%lf", A->_float);
        ks_str_copy_cp(&ret->_str, tmp, strlen(tmp));
    } else if (A->type == KS_TYPE_STR) {
        ks_str_copy(&ret->_str, A->_str);
    } else {
        ks_error("repr given unknown type (id: %d)", A->type);
    }

    return ret;
}

So, you could have a pointer to a list of ks_objs for the arguments, and you have an integer telling how many there are. repr takes exactly 1 argument, so if its not one, error out.

Although some might think its easier to write ks_obj ks_std_repr(ks_obj A), we will find its much easier when we’re dealing with different number of parameters, and variable numbers of parameters to have a list of arguments. This also makes it easier, since now every function will have the same signature (so they can be exported as a standard signature and called as such).

Now, consider the function print:

// print all arguments as string representations, joined by spaces
ks_obj ks_std_print(int args_n, ks_obj* args) {

    int i;
    for (i = 0; i < args_n; ++i) {
        ks_obj repr = ks_std_repr(1, &args[i]);

        if (i != 0) printf(" ");

        if (repr->type == KS_TYPE_STR) {
            printf("%s", repr->_str._);
        } else {
            ks_error("Internal error; `repr` gave a non-string");
            ks_obj_free(repr);
            return ks_obj_new_none();
        }

        ks_obj_free(repr);
    }

    // end with a newline
    printf("\n");

    return ks_obj_new_none();
}

It internally uses repr, and prints a variable number of arguments (that can be zero!) with spaces between them.

For example, if we change the rest of src/kscript.c to be:

// src/kscript.c

int main(int argc, char** argv) {

    ks_obj sconst = ks_obj_new_str(KS_STR_CONST("ANSWER OF ALL IS"));
    ks_obj num = ks_obj_new_int(42);
    
    ks_std_print(2, (ks_obj[]){ sconst, num });

    // always free when you're done!
    ks_obj_free(sconst);
    ks_obj_free(num);

    return 0;
}

And run: make && ./kscript, we get:

$ make && ./kscript
cc -O3 -std=c99 -fPIC src/kscript.c -c -o src/kscript.o
cc -O3 -std=c99 -L./ src/kscript.o -lkscript -o kscript
ANSWER OF ALL IS 42

So, it seems to be working, and its printing its arguments, using their string representations.

Let’s start to generalize this. Let’s define a function-pointer type in C:

// src/kscript.h

// a C-function signature
typedef ks_obj (*ksf_cfunc)(int args_n, ks_obj* args);


We’ll also add a type of KS_TYPE_CFUNC, and a corresponding entry to struct ks_obj (object._cfunc).

Now, functions can be objects too!

Dictionaries

If we look at Python, we notice that most things are dictionaries. Objects are really just dictionaries of their member variable names and values. For objects other than integers, floats, strings, etc we also want the so-called ‘duck-typing’ of python, so we will also introduce a utility type ks_dict, which operates on ks_str,ks_obj pairs as key-value pairs.

This isn’t too complicated, and here’s the definition:

// src/kscript.h

// kscript dictionary, translates ks_str->ks_obj's 
typedef struct {

    // number of entries
    int len;

    // the maximum number of entries
    int max_len;

    // a list of keys
    ks_str* keys;

    // list of their values
    ks_obj* vals;

} ks_dict;

// the empty, starting dictionary
#define KS_DICT_EMPTY ((ks_dict){ .len = 0, .max_len = 0, .keys = NULL, .vals = NULL })

// returns the index of the key into the dictionary, or -1 if the key doesn't exist within it
int ks_dict_geti(ks_dict* dict, ks_str key);
// sets dictionary at a given index, or if `idx` is -1, adds the value to the dictionary
// returns the index of the object added (same as `idx`, unless `idx` was -1)
int ks_dict_seti(ks_dict* dict, int idx, ks_obj val);
// sets the dictionary's value for a given key, and returns the index at which it is located now
int ks_dict_set(ks_dict* dict, ks_str key, ks_obj val);
// free's the dictionary and its resources
void ks_dict_free(ks_dict* dict);

Of course, you can see the implementation of these functions here. This isn’t too important to show, but it could be a good learning exercise if you’ve never implemented a dictionary before.

Essentially, to get a value, you search through the keys, and if you find a matching key, you return the index. If not found, return -1.

Then, whoever called it can check whether or not it exists (by checking if its >=0). If its not valid, throw an error, or handle it, else just call dict.vals[idx] to get the value specified by a key. There’s also logic to set a value at a key, or add to the list if its not there.

Note, we haven’t implemented ks_dict_del to delete items yet. That might come eventually, but isn’t useful at the moment.

And, with some changes to our struct ks_obj, we now have:

// src/kscript.h

// types of objects
enum {
    // the none-type, null-type, etc
    KS_TYPE_NONE = 0,

    // builtin integer type
    KS_TYPE_INT,

    // builtin floating point type
    KS_TYPE_FLOAT,

    // builtin string type
    KS_TYPE_STR,

    // builtin C-function type (of signature ksf_cfunc)
    KS_TYPE_CFUNC,

    // this isn't a type, but is just the starting point for custom types. So you can test
    //   if `obj->type >= KS_TYPE_CUSTOM` to determine whether or not it is a built-in type
    KS_TYPE_CUSTOM
    
};

// the internal storage of an object. However, most code should just use
//   `ks_obj` (no struct), as it will be a pointer.
struct ks_obj {

    // one of the `KS_TYPE_*` enum values
    uint16_t type;

    // These will be used in the future; they will hold various info
    //   about the object, for GC, reference counting etc, but for now, will be 0
    uint16_t flags;

    // an anonymous tagged union
    union {

        // if type==KS_TYPE_INT, the value
        ks_int _int;
        // if type==KS_TYPE_FLOAT, the value
        ks_float _float;
        // if type==KS_TYPE_STR, the value
        ks_str _str;

        // if type==KS_TYPE_CFUNC, the function
        ksf_cfunc _cfunc;

        // if type>=KS_TYPE_CUSTOM, it just has a dictionary of values that it keeps updated
        ks_dict _dict;

        // misc. usage
        void* _ptr;

    };
};


// returns a new none object
ks_obj ks_obj_new_none();
// returns a new integer with specified value
ks_obj ks_obj_new_int(ks_int val);
// returns a new float with specified value
ks_obj ks_obj_new_float(ks_float val);
// returns a new string with specified value
ks_obj ks_obj_new_str(ks_str val);
// returns a new cfunc with specified value
ks_obj ks_obj_new_cfunc(ksf_cfunc val);
// returns a new custom-type object with a fresh dictionary
ks_obj ks_obj_new_custom();
// frees an object and its resources
void ks_obj_free(ks_obj obj);

Now, going back to src/kscript.c, and specifically, the repr function, we have:

// src/kscript.c

// return (string) representation of a single argument
ks_obj ks_std_repr(int args_n, ks_obj* args) {
    if (args_n != 1) {
        ks_error("repr takes %d args, was given %d", 1, args_n);
        return ks_obj_new_none();
    }

    // get the only argument
    ks_obj A = args[0];
    // the result we'll return
    ks_obj ret = ks_obj_new_str(KS_STR_EMPTY);

    // check type
    if (A->type == KS_TYPE_NONE) {
        ks_str_copy(&ret->_str, KS_STR_CONST("NONE"));
    } else if (A->type == KS_TYPE_INT) {
        char tmp[100];
        sprintf(tmp, "%ld", A->_int);
        ks_str_copy_cp(&ret->_str, tmp, strlen(tmp));
    } else if (A->type == KS_TYPE_FLOAT) {
        char tmp[100];
        sprintf(tmp, "%lf", A->_float);
        ks_str_copy_cp(&ret->_str, tmp, strlen(tmp));
    } else if (A->type == KS_TYPE_STR) {
        ks_str_append_c(&ret->_str, '"');
        ks_str_append(&ret->_str, A->_str);
        ks_str_append_c(&ret->_str, '"');

    } else if (A->type >= KS_TYPE_CUSTOM) {
        ks_str_append_c(&ret->_str, '{');

        // add all entries of dictionary
        int i;
        for (i = 0; i < A->_dict.len; ++i) {
            if (i != 0) { 
                ks_str_append(&ret->_str, KS_STR_CONST(", "));
            }

            ks_str_append(&ret->_str, A->_dict.keys[i]);
            ks_str_append(&ret->_str, KS_STR_CONST(": "));

            // get repr of subobject
            ks_obj subrepr = ks_std_repr(1, &A->_dict.vals[i]);
            ks_str_append(&ret->_str, subrepr->_str);
            ks_obj_free(subrepr);

        }

        ks_str_append_c(&ret->_str, '}');
    } else {
        ks_error("repr given unknown type (id: %d)", A->type);
    }

    return ret;
}

Now, we handle dictionary’s repr by giving it some structure. Changing the main to:

// src/kscript.c

int main(int argc, char** argv) {

    ks_obj sconst = ks_obj_new_str(KS_STR_CONST("ANSWER OF ALL IS"));
    ks_obj num = ks_obj_new_int(42);
    ks_obj dict = ks_obj_new_custom();

    ks_dict_set(&dict->_dict, KS_STR_CONST("key1"), num);
    ks_dict_set(&dict->_dict, KS_STR_CONST("key2"), sconst);

    ks_obj print = ks_obj_new_cfunc(ks_std_print);
    
    //print->_cfunc(2, (ks_obj[]){ sconst, num });
    print->_cfunc(1, (ks_obj[]){ dict });

    //ks_std_print(2, (ks_obj[]){ sconst, num });

    // always free when you're done!
    ks_obj_free(sconst);
    ks_obj_free(num);
    ks_obj_free(dict);

    return 0;
}

And running, we now have:

$ make && ./kscript
cc -O3 -std=c99 -fPIC src/kscript.c -c -o src/kscript.o
cc -O3 -std=c99 -fPIC src/log.c -c -o src/log.o
cc -O3 -std=c99 -fPIC src/str.c -c -o src/str.o
cc -O3 -std=c99 -fPIC src/obj.c -c -o src/obj.o
cc -O3 -std=c99 -fPIC src/dict.c -c -o src/dict.o
cc -O3 -std=c99 -shared src/log.o src/str.o src/obj.o src/dict.o -o libkscript.so
cc -O3 -std=c99 -L./ src/kscript.o -lkscript -o kscript
{key1: 42, key2: "ANSWER OF ALL IS"}

So, we can see that we now have a functioning dictionary class that can store arbitrary types!

We’re getting close to actually writing in the language itself. We’ll open that can of worms next tutorial, but we now have most of the building blocks we need for a generic programming language

Source for this part: https://github.com/ChemicalDevelopment/kscript/tree/5f23d288dcafd06bfa1840ad9cee8a88930ce4da