Reading and Writing Electronic Texts - Week 08
- Mar 27
- 2 min read
Assignment 03:
After getting a bit lot with the second assignment and being humbled by trying to write code from scratch with no LMMs, I decided make something simple mostly based on the references from class and dedicate more time to figuring out what kind of text generating I want to achieve.
I had two ideas in mind for the assignment.
The first one: taking a text in biblical Hebrew, Ecclesiastes and filtering root letters. Semitic languages are based on a (mostly) 3 letters root system in which you apply 3 consonants and sometimes vowel letters on existing patterns. My thought was to generate a short story with some of the ngrams that are shared roots in Hebrew and Arabic. I thought it would be interesting this roots since it could be a way to convey an educational story for kids or adults learning Arabic since some words are parallel while other has high semantic similarity. another aspect that made me want to explore this filtering of roots is the fact that many of them a homographs when manifested in a 3 letter word and could be read even in 4 or more ways each one has a different meaning.
The second reason I chose Ecclesiastes is because I was also interested in exploring how common they are along the process (I thought maybe it would teach me about the text). I though about it being a part of the Axial Age and that it would be interesting to compare them most common N-grams and explore their filtered wisdom in the light of language usage.
This is still a WIP. I couldn't find a way to "clean" the words from semitic features like possessive pronouns and prepositions that in semitic languages connect to the main word. one idea I have had is using a free hebrew dictonary text source I found and filter the 3 letters n-grams according to it but I wasn't sure how to do it.

I got this list of words, around 40% of them are actual words in hebrew, and around 50% of those words are roots that can be read as verbs (and not just nouns or prepositions).
I later filtered this manually since I wasn't sure how to do that and printed the story again.

This is a free translation for the manually filtered words. it could be read in many ways, this one felt coherent to me. (relativelly) I still need to understand how to filter n-grams that look like a root but may just be a random part of another word since pattern letters might be confused with root letters, so while a word can be a valid root, it doesn't necesserally represent the true meaning of the text.
This is the URL for Jupyter notebook:


