Samuel’s blog and portfolio

Lots of stuff

2024-05-01T00:00:00+02:00

It’s been a couple of busy years since my last update in 2022, so I have quite a few updates, not only project/hobby related!

2023

During this year, I finally finished my Computer Science degree. This was a really busy one, as I had quite a few interesting but challenging courses, and I also took part in a research collaboration grant.

Cool 2023 projects

Computer Graphics project

We uploaded the raytracing & photonmapping engine we developed as part of the Computer Graphics course (here). It was purely developed in modern-ish C++ and has some niceties such as multithreading, use of acceleration structures, basic .obj file handling and an implementation of constructive solid geometry. As a showcase, we managed to render a recreation of Twin Peak’s Red Room:

Also, here’s the obligatory Cornell Box:

ARM Sudoku

Although this project was developed during 2021/2022, I only went around uploading it half a year ago. This is a fully functional Sudoku game playable through the GPIO and containing visualizations for both the UART console and the GPIO itself, for the LPC2105 ARM7TDMI-S CPU (simulated through Keil uVision). It was developed on a mix of C and ARM assembly (for interrupt handling, power saving…), with an event queue architecture.

Research collaboration

Most of my time during 2022 and 2023 was dedicated to a research collaboration at the SID group. Thanks to this opportunity, I was able to delve into State-Of-The-Art techniques within fields such as Natural Language Processing (use of LLMs, NLP libraries…) and Semantic Information Retrieval (Knowledge Graphs, IR indexing engines such as elastic…). This collaboration has been extended until today, and we already have a conference paper pending for review. I can’t wait to make the repo public, as it’s my biggest project yet :)

2024

During this year, I finished my CS degree and moved to Barcelona for an MSc. in Intelligent Interactive Systems at Universitat Pompeu Fabra. During this year, I have been able to delve more into Natural Language Processing and AI/Machine Learning, with some Data Science in between. As a cherry on the cake, I’m also doing an intership at the European Commission’s Joint Research Centre.

…It’s been a way busier year.

Cool 2024 projects

IndexerMcIndexFace

During my research collaboration, I relied quite a lot on different BM25 / BM25F implementations for document retrieval. I found most of them lacking in different aspects, such as parameter tweaking freedom or even transparency due to straight-up weird behaviors. I wanted to see just how hard it really was to implement a (very basic) BM25F ranking function, so I wrote a really tiny-but-fast document indexing and retrieval system. I also used it as an excuse to play with FSTs and Rust’s parallelization capabilities, as I wrote it exclusively in Rust.

Computational Semantics course

As part of my Master’s, I took a course on Computational Semantics, where I worked on a wide range of NLP techniques, starting from “traditional” ones (use of WordNet and other corpora, static embeddings…) until reaching the State-Of-The-Art (contextual embeddings, multimodal LLMs…). I uploaded two of the most fancy projects here.

Conversational Agent

We also developed the inner systems of a Conversational Agent custom-tailored for Hotel and Restaurant requests, as part of a Natural Language Interaction course. I published its Named Entity Recognition module, which was the system I worked the most on, here. It consists of a NER system for the Semantic frame slot filling task of a Conversational Agent, trained and evaluated using the MultiWOZ dataset. This is made up of different components, as we went a bit beyond simply detecting direct mentions of entities in sentences:

A NER system based on a Deberta-v3-base model, fine-tuned using the flair NLP library.
- Detects directly mentioned slots (restaurant food types, restaurant and hotel names…)
- Detects question and dontcare direct mentions
A K-Nearest Neighbors classifier that is fed scaled sentence embeddings of user utterances using bge-large-en-v1.5, which was used to classify question and dontcare user utterances that indirectly refer to previously mentioned slots.

As a picture is worth a thousand words, here’s how our NER system works when talking about SpongeBob entities (which we really hope the dataset didn’t contain), and when the user prompts the system with a question alongside many other entity mentions.

The whole system will be uploaded soon, as might more projects that we did in other subjects!

Flowfields

2022-02-07T00:00:00+01:00

As part of my courses, I have studied different pathfinding techniques, but flowfields were not among them. These are used in videogames for crowd’s pathfinding as a cheaper alternative to the A* algorithm, so I have implemented a basic demo for this pathfinding technique in a concurrent environment (using Go, as it makes concurrency and synchronization extremely easy).

This is a very limited demo as I did it in just a couple of days, and is loosely based on the guides from https://howtorts.github.io/

Its current limitations are:

There are no physics (velocity, collisions, etc.): every agent moves exactly one cell per movement, based on its current cell neighbours and other agents which may be occupying these cells.
There are no predictions and look-aheads, since there is a hard constraint which mandates that no two agents can occupy the same cell at any given time.
The structure is designed for debugging rather than performance and usability:
- The solution is inherently concurrent but forces each agent to synchronize with a barrier in order to coordinate their movements and render them.
- Each agent has one flowfield irregardless of other agents who may have the same objective, and also share a shared flowfield for tracking other agents. These two structures should be joined together in order to not duplicate data.
The obstacle functionality is implemented, but not tested.

The source is available here

CHIP8

2021-09-03T00:00:00+02:00

I have implemented a CHIP-8 VM, based on starrhorne’s amazing CHIP-8 emulator. I wanted to play with Rust’s low-level stuff and work on a small emulator, and this was a great choice. Since it is pretty much based on starrhorne’s project, I forked it. Some parts of it are the same source files, self-commented, with a few tweaks here and there which don’t really change the overall behavior.

I added a few functions such as being able to see the CPU registers, stack contents and a small instruction history, which requires providing a valid .ttf file for displaying text. I included one (Terminus TTF) in the project.

I also moved around some structural parts, such as the timers which now reside in a separate thread running at 60Hz.

Finally, it is also possible to:

Pause the emulation by pressing the spacebar.
Increase the game’s frequency by pressing the Up arrow. (Doesn’t affect the timers)
Decrease the game’s frequency by pressing the Down arrow. (Doesn’t affect the timers)
Toggle sprite wrapping on/off, as some games require wrapping, and others not (via arguments).

Overall, this was a nice project to study and mess with, maybe some day I will tackle a more complex emulator.

More information, including the project’s source code, is available here

BMO-Boy!

2021-08-22T00:00:00+02:00

One of my summer projects was to build a BMO robot, with help of a 3D printer and a custom software stack developed on my own. Today, I’m presenting my own customised BMO-Boy!

This project is based on Orbian’s BMO-Boy, with a few differences here and there.

I basically ditched the robotics and instead opted into making a somewhat complicated software stack for powering BMO, since I wanted to play with voice recognition and use somewhat complex libraries. Along the way, I also realized I could easily implement this in Rust, a language which I also wanted to tinker with this summer.

The result is this!

And here’s BMO saying hi!

Hardware details

My own BMO-Boy is using the following hardware parts, which differ a bit from Orbian’s design:

Instead of Adafruit’s 3.5” TFT screen, I used a suspiciously similar one found on Aliexpress, which only differs on the controller board used. Its only drawback is that its mounting holes are extremely small for the 3d model screw holes, so it will be crammed inside BMO without being mounted (Not that it will have much space!)
I used a Raspberry Pi Zero W, and a ReSpeaker 2-Mics pHAT for audio input and output. Nothing else.
In order to save a little more money, I didn’t use any batteries. I opted instead to just plugging the Pi Zero to a charger, and the screen to a DC charger.
I bought a GPIO header extender in order to make space between the HAT and the Pi Zero for the RCA cables, which have to be soldered into the Pi Zero in order to get video output to the screen. I used solderless dupont-to-female RCA connector like this to simplify things.
I used m2.5 brass inserts and screws for fixing the Pi Zero and its HAT to the backplate, and also for fixing the backplate to the main body (since I only had m2.5 screws, I had to drill through its holes)

The HAT’s purpose will be picking up and streaming audio into a (hopefully) more powerful computer.

As for the 3D print, I simply printed Orbian’s model and painted it with Tamiya’s TS41 spray paint, which was the closest color to BMO’s I could get my hands onto. Orbian has a handy guide for its building on youtube, though the only relevant part here would be painting it.

The components barely fit inside, so maybe another 3D model could be better. Other contenders would be this one, which I attempted to print but didn’t have enough precision for the screws, and this improved version which I haven’t tried.

It is rather tight in there…

Software details

The software running BMO consists of three parts:

voice2json, which is an amazing software which turns audio input into recognized (and configurable) intents with a confidence value associated to them, by making use of trained ML Speech Recognition models.
- voice2json will be in charge of parsing the audio input picked up by the raspberry’s HAT into intents.
- As its name indicates, the output will be in form of JSON strings, so we will need a parser for it, which will be the client!
A simple client, which is in charge of:
- Listening for JSON input from stdin and parsing it into intents.
- Sending intents with enough confidence to the server.
A ~~somewhat misleading~~ server, which is in charge of:
- Displaying BMO’s face to the screen, and playing audio tracks.
- Listening over the network for intents, which dictate which faces and audio tracks should be played.

Both voice2json and the client will be running on a proper computer, and the server will be running on the Pi Zero. The reason for this is that the Pi Zero is not exactly really powerful, and cramming both machine learning, JSON parsing and a “game” into a 1-core ARM CPU is not really a good idea.

The client and server are both written in Rust. While the client is somewhat simple, the server makes use of more complex libraries such as SDL (for displaying graphics), and Soloud (for playing audio). It also makes extensive use of Rust’s amazing concurrency capabilities, which was a great learning experience!

Functionalities

BMO’s functionalities are the following:

It can recognise an arbitrary number of speech expressions, as long as they are associated to simple intents (anger, surprise, sing_a_song, say_hello…)
It can match an arbitrary number of faces and (optionally) audio tracks to each intent. The result is that BMO can answer to intents with:
- A randomised static face image among those associated with the intent, with a configurable time limit.
- A series of (also randomised) faces switching at fixed intervals while an audio track plays in the background (In other words, BMO can speak anything in any way you want!)
- An associated function. Currently, BMO can:
  - Display the current weather. (This can be opted-out from, since it requires an API key from openweathermap)
  - Configure (by voice!) and display a chronometer, with a configurable alarm playing at the end.

The setup

All software is intended to run on Linux (voice2json included).

On the Pi Zero’s side, any Linux distribution should work, as long as it has the SDL2, SDL2-image, SDL2-ttf and openAL libraries required to run the server. I used a headless Raspbian image with no DE installed, which I controlled via SSH. bmOS_server should automatically pick up the display and show itself in fullscreen mode.
On the “auxiliary” host’s side, again any distribution should work, as long as it can run voice2json. I used Arch Linux alongside a Docker image for voice2json. There are native voice2json packages for Debian available, too.
In order to pick up and stream audio from the Pi zero’s microphones, an external solution should be found for this. I simply used a bash script to record and transmit audio via SSH, from the host which is running the client. The script is detailed in the client’s documentation, but any other solution can be used as long as it’s accepted by voice2json.
In order to get audio from the raspberry and hear BMO’s voice, the 3.5mm jack in the HAT can be used, but it adds yet another cable to manage. I chose to simply stream it too.

Source code

Everything is open source, so anyone can play with it. I doubt anyone will want to make use of this since it is quite a simplistic and specific project, but who knows!
The source code for the client is hosted on Github and crates.io, with its more in-depth documentation hosted on docs.rs
The source code for the server is hosted also on Github and crates.io, with its documentation hosted on docs.rs
voice2json, synesthesiam’s project, is hosted here, with its documentation hosted in its main webpage.

Assets

Since all assets should be for personal use, I didn’t include them in the source code in order to avoid any potential problems. I simply ripped audio tracks from the show, and used Orbian’s face assets hosted here, along with a few of my own, so that there could be more variety when talking. Nonetheless, feel free to send me an email at bmo@samueldgv.com if you want my assets.