Skip to content

Dumping The Donald

Written by

Viking Sec


I’ve started a weekly newsletter! If you want updates on this blog or my research, or just a weekly dose of the going’s on in information security and threat intelligence, you can subscribe here.

After the January 6th attack on the capitol, and after many of the events of the last few months and years, I’ve become increasingly worried about the rise of fascism. It’s one of the reasons I started this blog, and it’s spurred my own political evolution over the last year. After the sixth, though, I knew I had to do more than donating and spreading awareness. So, I set my sights on TheDonald.

The Donald’s Checkered Past

The Donald started as a subreddit, /r/TheDonald, that had a less than stellar history. Reddit censored and warned the platform on multiple occasions for failing to moderate hate speech, threats of violence and constant cross-sub harassment. Finally, in an overall purge of less-than-savory material, Reddit finally permabanned TheDonald after it had become one of the largest MAGA platforms on the web at the time.

At this point in time, Gab was growing and Parler was yet to be created if I recall correctly, so the former grew and many of TheDonald’s users vowed a return. Thus, TheDonald(.)win appeared, a blatantly obvious Reddit twin with little to no content moderation requirements. Threats of violence flowed like cheap light beer in a MAGA frat party as users discovered that their hosting providers, OVH, and their DDoS protection provider, CloudFlare, would pretty much let them get away with anything… and still do.

In the leadup to the January 6th capitol riot, the violent rhetoric on TheDonald increased. The attacks have since been praised by many on TheDonald, with continuing violence being discussed and supported by users on the site.

CloudFlare actively ignoring violence on one of their customer sites

So, what to do?

Dumping on The Donald

So, what better than to make sure that if or when TheDonald disappears, we still have access to the data? Make it searchable, archivable and easy to research for OSINT analysts and antifascist researchers alike. So I set about creating a scraper. If you’re bored by technical details, feel free to skip to the bottom where I give you access to all the data I’ve scraped, but if you want to know how I did it, stick around.

By way of disclaimer, I don’t think this is the best way to go about this. It’s the way I did it, but it’s slow and not super well automated. I plan on making some tweaks over the coming week as I have time, but it’s allowed me to scrape well over 10 thousand posts and links.

CloudFlare made it quite a bit more difficult to scrape this site. Normally, I would write a scraper first in raw requests using Python. If there were any rate limiting or other protections, I’d move to Selenium. With the vast majority of sites I’ve written scrapers for, this worked fine. But because CloudFlare is still protecting fascist insurrectionists hellbent on committing violent acts against political enemies, I had to be a bit more creative.

First, I did some research on writing browser extensions in JavaScript. It’s relatively simple to do and I had a simple one working in about 30 minutes. I wrote the JavaScript scraper in about an hour and 30 minutes later I had a backend Flask server to process and store the posts in a Mongo database. The extension requires me to actively be on the page and requires some refreshing and debugging relatively frequently, but it works relatively well. Because CloudFlare, it was easiest to write a scraper for the front page that would grab usernames, post titles, dates and links but a bit harder to write one that would then fetch the body of those posts… so, I’ll likely have to either wait for CloudFlare to drop service for TheDonald (pls pressure them) or find some other kind of workaround.

Notably, @SoaTokDhole on Twitter found the real IP behind CloudFlare but it doesn’t lead to a CloudFlare bypass. If anyone does find one, it would be pretty cool of you to shoot that my way so I can ramp up scraping a bit more.

The Data

As of the time of writing, I’ve pulled about 12,000 links including the username, post title, date of post and links associated to the post and the user. The plan is to use these links as a list of links to scrape later once I find a CloudFlare bypass or CloudFlare drops their TheDonald. I’ll be doing some more posts in the future with some analysis of the data, but until then you can find the data on my GitHub.

The Future

I’m going to keep developing the scraper and will keep the data updated as well. You can follow me on Twitter to know when I push more data, or just follow the GitHub repo as well. If you want to help out, just go do it. Hit me up if you want to do something collaborative, but people across the world are hitting extremism where it hurts however they can. Be aware of the law, anything you do is your fault and not mine, blah blah blah. Oh, and put some more pressure on CloudFlare for continuing to protect these violent fascists using the hashtag… I don’t know, I’m bad at this. #CloudFlareProtectsFascists or something.

I write relatively extensively on the far-right and will be continuing to publish my research and findings on this blog. Watch this space for more research.



Previous article

No Matter What Happens Now, The Right Won on the Sixth

Next article

Lame.htb: HelloWorld for HackTheBox