Building NLP Summarizer for Ballots in 2019

Can NLP Make Voting Less Painful?

Let's be honest: have you ever voted for a "superintendent of the local water tank" by their intro's word count? You're supposed to vote well, but reading the finetext is a rather silly pain point.

It was 2019 when I started Apella to use NLP to solve this. NLP can summarize the source material, give us the pros and cons, and do sentiment analysis of "reviewers" from all sides. My vote shall no longer be swayed by word count!


The Webapp: Spacy Django

The webapp was built with Django. It was easy to spin up, with lots of security features right out the box.

The core of the application was a set of interconnected models that represented the key business concepts:

  • Organization: Interest groups - so like city councils or advocacy groups.
  • Topic: The specific issues or ballot measures up for debate and vote.
  • Campaign: The ballot to be voted on.
  • Post and Comment: User-generated content for discussions.

The NLP: Summaries and Sentiment

To make the information more digestible, Apella used Natural Language Processing (NLP) in two key ways:

1. Ballot Summarization

The summary field in the Topic and Campaign models was designed to hold automatically generated summaries of lengthy ballot measures. This was achieved using libraries like spaCy and NLTK to perform extractive summarization, identifying the most important sentences in a text and combining them to create a concise overview. This script was to be ran on a separated server from the webapp's (i.e. batch jobs from my laptop.)

# From Apella/topics/models.py
class Topic(models.Model):
name = models.CharField(max_length= 255)
summary = models.TextField()
...

2. Sentiment Analysis

The ExplorePost model included a sentiment analysis feature to gauge public opinion. By tracking user interactions (likes and dislikes), as well as analysis of the affects in each review post, we could get a pulse on how the community felt about different topics. This was implemented using libraries like TextBlob, which provides a simple API for sentiment analysis.

# From Apella/topics/models.py
class ExplorePost(models.Model):
content = models.TextField(blank=True, verbose_name="Explore Post")
...
liked = models.BooleanField(verbose_name="Negative sentiment", default=False)
unliked = models.BooleanField(verbose_name="Negative sentiment", default=False)
...

The DevOps: Heroku

Back then, Heroku still had a free tier. It also threw in free database servers! Back then, I didn't learn Docker yet so it was a manual rebuild of the Django server on the VPS. It was rough. By using a Procfile to define the application's processes and a requirements.txt file to manage Python dependencies, at least the deployment process was functional. The additional DevOps work was simply Google Analytics and setting the statics onto Google's CDN.

This DevOps design wouldn't have scaled - it didn't have CI/CD workflows upon commits or unit tests or hardened tunnels between the webapp server and the ML server or messaging queues or a million other things. But it was a PoC!

Bootstrap for frontend. Good times.
Figure 1: A quick screenshot. Bootstrap for frontend. Good times

Civic Tech

Apella was an ambitious project, and while it never reached its full potential, I'm still proud. I learned a lot from putting myself out there and meeting folks in person to get buyin from interest groups and council members. Looking back, I spun Apella up right before COVID and the craze of ChatGPT. I'm actually surprised, and somewhat saddened, to see that the same technical and non-technical barriers I ran into are still present. You would think that with all these LLMs floating around, the flow of talent and capital would have diverted from nudging retirees and social media bubbles already to a more sanitized, thorough, and robust approach of informing the overall population of voters. Maybe someday it will.


Ben Truong
Ben Truong
ML Engineer

I build ML/GenAI apps for the cloud and private clusters.

Related