Public Projects

Here are some projects I am highlighting along with associated publications. Many are in the Computer Vision/Multi-Modal ML/AI Sphere, and some are in the area of procedural content Generation (PCG). The intersections of both areas interest me, as well as the theories of how they tie into the theories of generative worlds in general.

My internal company specific projects are not listed here for IP purposes, but hopefully this will give you an idea as to my interest!

Implementing Handwriting Recognition on the New York Times Crossword’s App

I explored the ability to write in The New York Times Crosswords app on our Android Games App using AI and Machine Learning. I was honored to share my platform specific experience implementing on-device ML onto the Android Crosswords app. This article outlines a comprehensive breakdown on how one might implement such a system from the native on-device model end.

Below are the coverage on these apps:

Publication on Nieman Lab: Article on Handwriting Recognition in The New York Times Crosswords on Niemann Lab.

Publication on The New York Times: Article on Handwriting Recognition in The New York Times Crosswords on The New York Times

Neural Style Transfer: Image Attribution for Artist Compensation

In this project, I investigated the principles of Neural Style Transfer , which is a well known AI technique developed in 2015 that is used to extract the styles of a piece of art and apply them to another piece of art. This technique is often misused to pilfer styles from an artist, for the purpose of creating pieces that are attributed under their name and sold along with their style.

We built a system that can perform Neural Style Transfer in a limited sandbox for purely demonstration purposes, but also that can be used to detect style percentages via using CLIP and the structural similarity index of style layers to created a weighted attribution score.

The paper for this work will be coming out soon, but for now we wanted to get a baseline for a system started for philanthropic reason and thus worked on this Charming Data Initiative to do so.

Charming Data Project Link: https://charming-data.circle.so/c/ai-python-projects/front-end-for-neural-style-transfer

Detectron Framework for Segmenting Video, Background Removal, and Panoptic Segmentation

In this work, I utilize the Detectron 2 Framework by Meta to take video of an urban scene, and perform various neural image processing tasks using various models in the R-CNN family.

First the video was broken up into invidual frames using FFMPEG, and then each frame was input into the model and transformed accordingly.

Detectron 2 is very powerful for performing all kinds of visual object recognition tasks as well as actor detection and localization, and can be applied on device to create small but powerful object understanding systems.

I trained these R-CNN variants on the COCO dataset, but you could train this model on any number of medical and disease datasets and obtain pretty useful disease detection results, from what I’ve seen.

Here is the Github and Colab notebook for where I set up Detectron to do this work.

Github: https://github.com/squoraishee/detectron_visual_pipeline

Colab: https://colab.research.google.com/drive/163oMU6scjcD2UlGgXUNFchDxTzoqdkCl#scrollTo=b6EJ0FR7zHak

Graph Grammar’s For Complex City Plan Implementation

In this project, I used graph grammars to explore how city layouts and infrastructure planning can be modeled and analyzed. Graph grammars, which define transformation rules for graph structures, are an effective way to simulate and design urban layouts, representing streets, buildings, parks, and transportation networks as interconnected nodes and edges.

I implemented a system that applies graph grammar rules to create and refine city plans. Starting with basic layouts, I used rules to add or modify roads, connect residential and commercial areas, and optimize traffic flow.

The transformations modeled the iterative process city planners might use to expand neighborhoods or adjust infrastructure in response to population growth. Examples in this video helped to demonstrate how the rules could create realistic city grids or improve connectivity.

Through simulating scenarios like adding new public transit routes or optimizing traffic patterns, the approach offers a practical tool for visualizing and refining ideas in urban design.

Simulated Cavern System Through Cellular Automata

In this project, I worked on creating cave-like structures using cellular automata, a grid-based system where each cell follows simple rules to change its state. The goal was to generate natural-looking cave patterns that could be used in games, simulations, or other applications. By starting with a random grid and applying these rules, the process mimics how natural caves might form, resulting in unique, organic layouts.

The project began with a random grid of cells representing open spaces and walls. Using rule-based iterations, the grid evolved over time to form cohesive and realistic cave patterns. The key rules focused on determining whether a cell became part of the cave or remained a wall, based on the number of neighboring cells in a similar state. Through repeated iterations, these rules smoothed the layout, creating interconnected pathways and distinct chambers.

This method showed how straightforward rules can produce complex and natural-looking results. By adjusting factors like grid size or the number of iterations, I could generate a variety of cave layouts, from dense labyrinths to more open spaces. It’s an effective way to create unique, functional environments for games or other creative projects without requiring manual design work.

Video Understanding, Visual Question Answer and Alt Text Generation

In this Project I built a basic visual question answering system used the BLIP model by Salesforce, to take in an image and allow a user to interrogate the image for different qualities. The answers can get quite granular, as with the correct training, the model is able understand complex details about the image - for example what the street that couple is walking on is made out of, or what the couple picture in the image are holding and wearing.

The second part of this project was to build a video understanding system that used I3DCNN Architecture trained on the Kinetics 500 data set to do video understanding. The video understanding model can pick out fine grained details from a video and determine a set of actions that are being undertaken in a video.

It also learns sequences of actions as well that occur in different parts of the video and has a score associated with each segment. This type of system can be used for all kinds of things including visual investigations.

These are also both open source models, so essentially no/low cost solutions that allow one to build their own systems from scratch without requiring an expensive closed source API.

Simulation of Reinforcment Learning on a Simple “Car” Track in Tensorflow.js

This is a simple reinforcment learning scenario that I put together using Tensorflow.js. The purpose is to demonstrate a simple “car” attempting to stay within the boundary of a track.

The car is punished whenever it exists the boundary, and it’s rewarded whenever it stays within the boundary. This in line with it’s basic policy function. When punished the car becomes red, and when in the clear it’s blue.

The car really only has control of one parameter which it can adjust to stay in line with or deviate from the policy function and that is it’s next incident angle which it can adjust by fractions of a radian in order to stay within the track boundary.