How Commenting on foxnews.com Feeds Big Tech (Google)

I was shadowbanned on foxnews.com. This led me on a fascinating journey of discovery on how commenting on foxnews.com is making Google's algorithm's smarter, and making it easier to suppress conservative views in the future.

I shouldn't be writing this right now. I have "work-work" to do, but this has been weighing on me all day. So I'm going to crank out a quick write-up, without FULL analysis, but will show you how you can get the same data for yourself so you can analyze your own comment history.

This is not insider info, but it might as well be. I'm sure very few people are aware this data exists, let alone have looked at it. You are likely to find your own history as fascinating as mine.

Finally, I've always marveled over the fact that foxnews.com is one of the few major news sites left that allows people to comment on articles. I've always thought of it as an exercise in "free speech," not a giant data collection scheme. Furthermore, I've recognized that commenting on foxnews.com is akin to screaming at the sky - many comments go unnoticed, especially on fast moving articles. Nonetheless, I participated, often scrolling through the comments after reading an article to better understand the sentiments of others, and feeling compelled to comment in return.

(The first few sections are background. Skip down to "The Good Stuff" if you want some red meat.)

Being a frequent visitor of foxnews.com, I picked up on my shadowban rather quickly. Shadowbanning is a rather insidious way of suppressing a user. The qualities of a shadowban often have these three elements:

Shortly after the Mueller testimony, I noticed that my comments were not being up-voted in the usual fashion. After four days of no interaction, I realized I must have been shadowbanned.

Frustrated with being shadowbanned, I started poking around the comments section to see if there was any way to engage the moderators. My comments are usually not offensive, per se, except to liberal sensibilities. I wanted to know what got me shadowbanned.

1. Expand the drop down for your user name in the Comments section, just above where you make comments.
2. Click Privacy. A Privacy overlay will pop up.
3. Click the "Request Full Data Set" button.
4. You will receive a "confirm your email address" email. Mine went to junk. Confirm. I got a webapi error on confirmation, but it seems to have worked anyway.
5. Wait 12-24 hours. You will receive another email saying your data is ready with a link to download it.
6. It's a little scary, because clicking the link immediately attempts to download a zip file with 4 embedded jsons. You always want to be wary what you are downloading! I verified email source via headers, then opened it on a burner computer. It's safe - just json data.

YOU CAN DO THIS, TOO! If you are a frequent commenter on foxnews.com, I STRONGLY encourage you to get the data for yourself!

The comments section of foxnews.com is supported by Spot.IM, an Israeli company. They are governed by certain EU rules, which means the data I received from them was pretty significant.

There are four files in all. The file with the bulk of the data you'll want to analyze is messages.json. The data is in json format, which looks like crazy computer code to the untrained eye, but is actually pretty easy to understand. Individual message objects are contained in curly braces { }, with name-value pairs of data inside. I recommend opening it in an application like Notepad++ for your first look.

My file contained data for 2,184 comments made since Fri, 22 Jun 2018 22:07:16 GMT. That's A LOT of screaming at the sky!(The times are listed in Unix time and will look this - "written_at":1529705236.3577912 . You can take the 1529705236.3577912 part and put it into a Unix time converter, as found at http://www.onlineconversion.com/unix_time.htm .

I was able to almost immediately confirm that I had, in fact, been shadowbanned based on the following metadata:

"tags":["blacklisted_sender"],
"tags_metadata":{"automated_state":"force_block_all"}

It is important to note that once "blacklisted_sender" is applied to your profile, it will show up in the data for EVERY comment, even before you were shadowbanned. To figure out what initiated the shadowban, you have to find the initial occurrence of this tag by searching up from the bottom:

After the Mueller hearing, I quickly flipped through ~3 articles/opinion pieces on the hearing. In each article, I saw a steady stream of liberal comments screaming, "IMPEACH NOW!" I posted a comment reading:

All the people fantasizing about impeachment are the TRUE "bitter clingers" in this country.

I saw a couple of specific comments screaming for impeachment that I wanted to respond to, so I cut and paste that comment a total of 5 more times in the span of about 2 minutes 30 seconds. This was dumb on my part, as it got me flagged for spam. Interestingly, though, the status for all those messages went to...

...which suggests that someone had to manually look at the comments, and (presumably) my history, and decide to shadowban me. With over 2,000 comments on the site at that point, I think it is pretty clear I'm not a "MAKE MONEY FROM HOME!!!" spammer and that I was just shooting off with a little unnecessary repetition.

Oh, well, I got shadowbanned. Buuuuuuut... LO! What is this in the file Spot.IM sent me?

Along with EVERY comment I've ever made, there is some interesting metadata attached! There is a section called "google_perspective_hidden". For the comment I "spammed," it looks like this:

"google_perspective_hidden":{
"attack_on_author":"3%",
"attack_on_commenter":"37%",
"identity_attack":"17%",
"incoherent":"38%",
"inflammatory":"64%",
"insult":"22%",
"likely_to_reject":"52%",
"obscene":"12%",
"profanity":"9%",
"sexually_explicit":"11%",
"spam":"9%",
"threat":"11%",
"toxicity":"22%"
}

Very interesting. I am a lifelong IT professional (25+ years) who has also taken Andrew Ng's Stamford course on Machine Learning. What does this tell me? (Disclaimer: There are some suppositions below, but for which my experience gives me a high level of confidence.)

First, every comment you make on Fox News is getting fed to a Google API that analyzes the text of the comment and rates it on several metrics.

Second, this API is most likely an "online machine learning system" which improves over time as it ingests data. In this case, "online" does not mean "on the web." It means that the system actively learns with each new data point fed to it - i.e. not bulk learning, but continuous learning.

Third, a system cannot learn unless you give it feedback to inform it when its assumptions are right. In some cases, that feedback can be in the form of additional system data - such as comment up-votes, down-votes, responses, and or flagging for abuse. In other cases, human users can validate the "correctness" of the assumption.

This third point is kind of important. Why? Because recent studies have shown that conservatives are LESS likely to engage negatively with content they disagree with compared to liberal users. That is, liberals are more likely to unfriend, dislike, block, or flag for abuse than conservatives. That means that if the system is using this kind of feedback to validate assumptions for metrics like "toxicity," then even the data from foxnews.com is probably training the system to learn in a "liberal" way.

There are hundreds of examples in my file where the Google API is rating comments I posted in a "questionable" manner. In the interest of brevity (haha), I will post only a couple. Again - I encourage you to get your own data and look at the ratings!

Here's a comment I made in response to someone claiming our country needs to be "fixed":

Our country doesn't need to be "fixed." Restored? Maybe, if you are talking about getting back to basics and ditching the BS PC garbage the left is injecting.

"google_perspective_hidden":{
"attack_on_author":"5%",
"attack_on_commenter":"18%",
"identity_attack":"47%",
"incoherent":"7%",
"inflammatory":"39%",
"insult":"54%",
"likely_to_reject":"95%",
"obscene":"98%",
"profanity":"54%",
"sexually_explicit":"16%",
"spam":"5%",
"threat":"19%",
"toxicity":"61%"
}

Not too bad. The inclusion of "BS" got it marked for obscenity and that may have elevated the toxicity, giving the comment at predicted 95% chance to be rejected.

Now let's look at something more benign. In response to one of the mass shootings, when lots of people were posting about "Gun control now!", I took a different tack and addressed the lack of morality in society.

We've become more godless, more morally "flexible", and more jaded to violence in general. Movies nowadays are often a form of "murder porn." It's not enough that the bad guy dies. He has to die in the most violent and "inventive" way they can come up with. No one is shocked. Instead they queue up to see John Wick raise his body count by another 500.

"google_perspective_hidden":{
"attack_on_author":"1%",
"attack_on_commenter":"3%",
"identity_attack":"43%",
"incoherent":"47%",
"inflammatory":"81%",
"insult":"50%",
"likely_to_reject":"54%",
"obscene":"49%",
"profanity":"54%",
"sexually_explicit":"76%",
"spam":"48%",
"threat":"69%",
"toxicity":"56%"
}

Take note that this comment is rated 81% likely to be inflammatory. (There is more on this SPECIFIC comment in The Good Stuff, Part Two).

Here's the thing: neither of these comments were auto-moderated out of existence. We also don't know which way the algorithm is learning. Is that comment going to be rated MORE or LESS inflammatory over time? The important takeaway is THIS:

Commenting on foxnews.com is feeding data to Google so they can build better, smarter systems to censor people in the future. Given the political persuasions of Google, I think it should be pretty clear that this data will only lead to more conservative censorship. Moreover, I think if you dig through your own file, you'll see that pattern starting to emerge.

This is going way longer than expected, so I'll try to keep this part short - promise!

Another key data set that you'll find in your file are comments that were reported for abuse. You'll see it appear something like this:

The "John Wick" comment I made above was reported. Another user "took offense" to that comment, and clicked the "Report" button.

I put "took offense" in quotes. Why? Because I don't believe the user was actually offended. Instead, I believe the user wanted to temporarily suppress the comment until the conversation stream on Fox had effectively buried the comment. They wanted to HIDE the comment by reporting it. It didn't toe the line with "ban guns," so they didn't want others to see it.

Bold supposition, right? Wrong. Looking through the list of comments I made that were reported, I see a pattern. Many comments are pretty straightforward. They present facts or opinions the other side doesn't want people to see, but are otherwise unoffensive. For example, on an article about Bill Nye and climate change, I posted the following:

Do yourself a HUGE favor and google "wattsupwiththat bill nye" and look at the article entitled "Al Gore and Bill Nye FAIL at doing a simple CO2 experiment" to see what a charlatan and liar Bill Nye is.

Reading this comment, do you think it rises to the level of offense that it must be reported? This was also my first comment on that article... so why report it? Unless, of course, you simply don't want curious people to see that and follow-up themselves.

I think Nancy Pelosi said it right: A glass of water with a D next to it could have won in her district. AOC is hardly accomplished.

Pretty sure he's right. Neither my best friend or I voted for Trump (I wrote in Cruz, I think he wrote in Rand Paul - ha), and we are both ready to pull the lever for Trump - not because we think he's awesome, per se, but because we are both so disgusted by liberal shenanigans, we want to ensure they don't win.

Um, wrong. It is ILLEGAL to cross the border without permission. That is an established law. It is LEGAL for me to own a firearm. So, good luck trying to take them via National Emergency. Molon labe.

There are many examples in my file where the comment simply seems to be reported because the person reporting it didn't like the sentiment, not because it was inherently offensive. Again, there are two net results of this malicious reporting activity:

1. It makes the comment disappear until moderation gets to it. Often this pushes it out of the reading pane for new readers, effectively burying the comment.
2. It helps train the Google algorithm to "understand" that conservative content (like "Molon labe") is offensive.

So take a look at what has been reported on your file. Understand what the left is doing, either wittingly or unwittingly.

I now have a file with nearly 100k lines of information giving me insight on how big tech's systems view my comments on foxnews.com. Looking over the system ratings are fascinating, but it also forced me to realize something with sinking despair...

Commenting on foxnews.com is just feeding data to Google - a known liberal player - to better train their algorithms to help censor conservative speech in the future.

(Final note: I noticed other stuff, too, but as I wrote at the outset, I have work-work to do, and this took a long time to write up.)

Very interesting.

I wonder if blocking the google and google related Java ‘bots’ in my browsers and having all the google websites blocked (sent to 127.0.0.1) in the HOSTS file, would mess with the FauxNoose - Google algorithms.

I’ve been ranting about Google spying for years, but no one believes me, or if they do believe it, they don’t care, being addicted to all the ‘free’ stuff Google gives them.

.............

And at least here at FR, even though there is a google tracker on every page, at least there is no shadow banning... And I rarely even go to Faux, and have never commented there.

No - this would not work. The Google API is being called AFTER the submit to the Spot.IM servers based on all available evidence. So long as you submit a comment to Spot.IM, Google is getting it.

I should point out that they may have a data retention agreement with Google that disallows Google from keeping the comment, but - as you may know - an online learning algorithm doesn’t need to keep the data to benefit from it. The data nudges the algorithm in the right direction and is then discarded.

Here’s what I’ve noticed in relation to trolls on the foxnews.com comments sections:

For the most part, conservative views are well represented and up-voted on Fox. On hot-button issues, though, you will see sweatshop-esque activity kick in, with a LARGE volume of liberal comments generated in a short period of time, typically when news hits. In general, when the sweatshop is at work, the liberal comments are very short, all follow a similar pattern, and often are ad hominum in nature. It is also during these times that you will see comments regularly disappearing due to being flagged, as described in my original post.

Beyond that, the trolling is “reasonable.” A minority of leftists who feel like it is their job to tell everyone how bad Trump is.

Well, that sucks. Thank you for the further information.

I suppose if you block Spot.IM too, then the comment engine wouldn’t even run?

Did I ever mention that I hate google?

I have quit doing business with, or even going to, many retail websites that require Google Java to use them. And I’ve told the business owners why they no longer getmy business.

Same with Amazon.... At least Amazon has only about 1/4 the penetration Google has.

I don’t go to the FoxNews website anymore. Can’t stand all the pop-up ads and videos before I can get to the articles. I am not a computer genius so I don’t understand most of what you posted but I do agree we are being controlled by the Left and this crap has to stop. Sure hope Trump can do something about it.

I thought it was MSNBC that used to allow comments.

Anyway, Samoa changed its time zone from -12 GMT (or +12?) to +13 GMT. The story said that Samoa would now have the “earliest sunrise.”

I commented that “earliest civil midnight” was not the same thing as “earliest sunrise.”

My comment was deleted, and the story was not changed.

Thanks, very interesting.

I have often thought that some things on Facebook, or online, are put out there specifically to collect data on people, or to train AI programs to deduce things about them. Things like “Take this Quiz”, “Do you Remember” or “Can you Name”.

Somewhere else, I recall reading that a main effort of Google, was to compile a database of the population, that included measures of psychological or attitudinal tendencies. They would draw input from multiple sources to fill out the fields and expand the file on individuals. That database itself could be used to train algorithms.

What magazines someone subscribed to could correlate with/indicate their political leanings; what music they listened to might correlate with/indicate their age, gender or ethnicity; etc.

Supposedly, it was this kind of trove of information on the population (some populated from Government databases or on Government contracts) that Eric Schmidt spun off to a new company he formed to help the Democrats in their election efforts of identifying (and more effectively manipulating) voters.

In addition to their algorithms, they have psychologists working to identify more ways to identify characteristics they can exploit, like shopping or voting decision making factors, and how those vary among groups - for targeting messaging, or manipulative sales techniques.

Fair enough, but your comment seems to miss the point.

Can you say with certainty that all of the services you DO use on the internet are not feeding data to Google? Discovering that Google was involved in Fox News comments was a surprise to me. Seeing the kind of processing they are doing on the comments is troubling given their bias. At the end of the day, it underscores the recent alarm over big tech bias.

I don’t use Google any more either - I use duckduckgo.com. I was not aware that Google could be potentially impacting Fox News comments. Where else are they exerting influence? When they exert that influence, is there a bias?

Finally - and most importantly - this is about the future. Right now they are building a system with data we are unwittingly providing to later (potentially) use against us. As a technologist, I see a significant threat on the horizon. It’s Big Brother territory.

Pretty much every website uses some kind of google services. They’ve spent well over a decade developing analytic tools for the web and they have the market pretty well cornered. There is no equivalent.

If you want to know who’s visiting your site, where they’re from geographically, what website they were previously on, which pages they visit on your site, how long they stay, what keywords they searched for when/if they used search results to end up on your site etc etc etc. It’s google and only google.

google analytics
google webmaster tools
google maps - is used in conjunction with the above for geographic data(and people think they’re just being nice with their Get Directions service)
google ads (the biggest online ads service)
youtube is google
gmail
google drive
Android operating system
Alexa

Enormous conglomerate to the web. With all the above services, apps, Android OS, they collect as much data as they can and put it all together.

Other internet services, like commenting systems for instance, also utilize google services. Their tentacles are everywhere.

...addicted to all the ‘free’ stuff Google gives them.
__________________________________
Privacy and free speech are a couple of my major personal concerns.

However: I write. Unfortunately, Google is better at some searches/services than any other search engine. I limit Google searches, but I need to use them.

Google Docs has taken over the world. Not only is it taught and used preferentially in schools, editors prefer a Google Docs link to a Word attachment.(I suspect that when I use Word’s research tools, my searches are still being forwarded ‘somewhere’. If I import visuals, including original illustrations, from my own machine to Word, they, too are likely then available elsewhere.)

Google is the preferred sign-in for too many sites to count.

On and on. They are ubiquitous.

...conservative views are well represented and up-voted on Fox. On hot-button issues, though, you will see sweatshop-esque activity kick in, with a LARGE volume of liberal comments generated in a short period of time, typically when news hits. In general, when the sweatshop is at work, the liberal comments are very short, all follow a similar pattern, and often are ad hominum in nature. It is also during these times that you will see comments regularly disappearing due to being flagged, as described in my original post.

Beyond that, the trolling is “reasonable.” A minority of leftists who feel like it is their job to tell everyone how bad Trump is.

______________________

Noticed the same activity pattern on Quora.

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.