Scrape twitter to analyse product performance with LLM

Sep 20, 2024

The idea of this project originated while I was looking through LangChain’s list of integrated tools. It turned out really easy and fun. I love how LLM makes sentiment analysis such a breeze!

First, let’s talk briefly about methodology. Then as a quick demo, we’ll look at what the twitter community is saying about 3 popular AI products: cursor, replit, and OpenAI’s latest release o1. Last, where you can apply this method on.

Part 1: how to set it up

I probably spent the most time trying to figure out how to get twitter data for free. Twitter’s developer API probably has the most developer unfriendly pricing ($100/month+). In the end, I decided to cheat with Apify (a web scraper aggregator). You get $5 free credit a month, just perfect for this prototype.

For each of the 3 products, I scraped their tweet search result pages, limiting output to 1 single day to control volume. Then simply load the results into Jupyter notebook. Each scraped tweet looks like this:

Next, I fed the tweet text to LLM to get the following summaries:

Sentiment (Do more people like the product or hate it?)
Tools mentioned (For my own inspiration to see what other hot AI tools are there)
Task described (What the user was trying to do with the product)

All these can be done by defining a single pydantic BaseModel in LangChain like this:

Then ask the LLM to reply us in this structured format (highlighted in the code below).

There you go. That’s the only function I wrote, kind of. And used it to run through the list of tweets on Cursor, Replit and o1 respectively. Easy breezy? What’s more amazing is your LLM will be multi-lingual too, so it doesn’t matter if the tweet is in English, Spanish or Japanese.

Part 2: Analysis results

Sentiments are not bad for all 3 products overall. With respect to one another, I think the results make a lot of sense too. Cursor (AI code editor) was raved by Karpathy. Replit as a broader development & deployment AI platform is probably harder to nail. There have been some good and bad reactions to OpenAI’s o1 from different places, but here the wisdom of crowd seems to suggest the new model holds great promise, corroborated by lmsys’s community votes.

I asked LLM to look through the texts and pull out the top 10 most interesting projects/ideas people are working on with the tool, here are the outputs for inspiration:

Part 3: Where this can be useful

Like it or not, twitter is probably the social network with the highest concentration of tech people. Analysing tweets is not new, but LLM makes the process so easy, customisable, and high throughput.

If you are an investor, this helps you to do market research, source and shortlist deal targets, also keep track of how your portfolio companies are doing without the need for expensive surveys.

If you are an operator, this can at least help you in:

understanding how your product is doing vs. others
more efficiently engage with your community (auto classify and reply tweets)
growth hacks (e.g. search for tweets about pain points your product can address, source for influencers to partner with)

These are applications off the top of my head. I’m sure you have really brilliant ideas. Let me know in the comments below!

Have fun!

Yue’s Substack

Discussion about this post