5 Minutes of your time. It's worth it.

jsd512 · Jul 14, 2018

Born in the 70s, raised in the 80s. I have never been a part of social media, never will be, I grew up without it. I fell in love without it, and I made friends without it. I learned about the world around me without it. I didn't get a cell phone until I was 27, only because I had a piece of shit car, that could break down at any moment. I didn't get a smart phone until 2015 because I like to take pictures. I don't even take it with me, most times I leave the house. I would rather talk to someone that text. Shit, I don't even talk to people anymore, not really interested because I can't relate to most. All I hear, revolves around Facebook, Twitter and Instagram, that's it. I, truly, have a hard time relating to this world we live in. I don't get it sometimes, living for those likes. I have to get those likes. My brother's ex, she ****ing lives for Facebook, Twitter and Instagram, it makes me mad because she is also a mother. God that woman pisses me off. She's gonna be one of those Facebook moms, always looking to exploit her two boys for likes and sympathy, **** me.

mnewxcv · Jul 14, 2018

Don't worry, it only gets worse from here.

KingBlack · Jul 15, 2018

jaxbusa said:
Same here, I’m interested.

Sent from my iPhone using the svtperformance.com mobile app

*Jay* said:
Sharing is caring, of youre able I wouldnt mind a good read via p.m.

earico said:
I'm glad to see everyone enjoyed this video. It's something I've been passionate about for years and I have taught my kids the same. Now they are both in HS and wonderful young adults.

If you don't mind please share that with us.

please keep in mind is a heavily edit version of my report, as I only included the results of the Twitter analysis:

Executive Summary

This report provides an analysis and evaluation of the abilities of machine learning to identify the dissemination of misleading information by individuals on social media platforms. Methods of analysis include text mining, sentiment analysis and neural network. These methods were compared against each other to confirm results against expectations. The results of the analysis show that while it is possible to detect tweets that attempt to spread misinformation, it is extremely difficult to identify them at their inception.

Introduction and Background

During the 2016 presidential election of the United States, potential voters had access to more media outlets then previously possible due to technology. The lowered cost of technology has allowed segments of the population to obtain devices that were not possible a decade earlier.

In previous elections, information on the subject came from a limited number of credible media outlets and flowed in one direction. Now that social media platforms exist online, the consumer of information can also be the provider. In addition, the low cost of acquisition to obtain the devices required for participation on social media platforms has essentially made every individual a media provider.

This has given individuals the ability to manipulate peoples perception on a range of topics. Russia’s state-controlled media Rossiya Segodnya director general Dimitry Kiselev said “Objectivity is a myth which is proposed and imposed on us.” [1] These tools include text, images, and recorded media among others.

The text mining of Twitter was a focal point of this report. Trending topics on Twitter indicate that a subject is currently in the top 10 of tweets on the platform. This is accomplished by averaging 250 tweets per hour over a 6-hour span, from 750 unique users. With this in mind, a diverse dataset is readily available for mining.

For the report, a trending topic of a divisive nature was selected for the study. R Studio function “TwitteR” is used to gather tweets related to that topic by its hashtag. These tweets were sanitized in R to remove punctuation, stop words, numbers and formatting.

The data collected will be used and compared against other sources related to the same subject. The baseline document will come from an unbiased, reputable source. That document will be analyzed for sentiment, tone, and perceived attitude and assigned a score. The recovered tweets follow the same protocol, then compared.

Significance of the Topic

While there are many mining techniques used to identify trolls, none exist to identify them fast enough to alert administrators so quickly, that the spread of the misinformation can be stopped. In this project, there will be attempting to identify users that exhibit behavior consistent with trolls without the use of large datasets related to the Twitter subject in question. The innovation is that machine learning will be incorporated in an attempt to do this in a minimal number of tweets.

Existing methodology uses hashtags of previously trending topics to gather data for analysis and determine what accounts related to the topic can be identified as trolls. While that method does return a high success rate of learning, the troll has succeeded because the information has already been distributed. Current methods tend to be tools for conformation instead of tools for prevention. The large datasets lend itself to be formatted and prepared for machine learning, with the user waiting for certain results. Knowing what to expect before the analysis could very well lead to biased sampling, making the results less reliable.

Term Project Objective/Problem Statement

The purpose of this project is to identify a method of analysis that can detect misleading and divisive post on Twitter within minutes of it occurring. The basic idea is to target new accounts that focus on hashtags related to current events that are politically polarizing.

The analysis uses different methods to reach a conclusion about the post. First, the post will be screened for the usage of negative and positive words. This will give a general idea of the direction the author is steering their potential audience.

Second, the sentiment of the tweet will be analyzed. This will help determine the tone of the author and gives a better overall picture of the statement.

Literature Review

As this topic is dynamic in nature, most of the reading of the subject were from online articles. An article from medium.com confirmed a previous suspicion – Twitter rarely gets involved with accounts, even if they are aware of its abuse [2]. Bot accounts tend to be web based while human owned accounts are likely to originate from a mobile device [3]. The usage of psychosocial attacks[5] are a primary tool used by trolls to evoke fear and anger in a population to promote civil disorder.

Solution Approach

The methodology used for this project includes text mining for the most common terms and to discover user sentiment on the topic. An attempt will be made to better understand the pattern and behavior of an account that uses a hashtag of a powder keg topic.

An unbiased documentation of an event will be used for a baseline reading on the topic. The selected document will undergo a sentiment analysis to determine the overall tone of the report. This value will be used and compared with the selected tweet.

Tweets are collected by using the function TwitteR to retrieve users that have commented on the hashtag. User data can be collected, as well as followers of the account, and other accounts that are followed. The information can be moved into a data frame, and location information, user id, actual text of the tweet, and much more can be extracted from the frame.

*technical details omitted*

This process occurs for files housing positive words list, negative words list, baseline document and tweets. The count of negative words in the document is saved into a variable, then divided by the over word count of the document. This value determines how “negative” the baseline document and tweet are.

A sentiment analysis is the next item to be performed. For the sentiment analysis, the process tries to determine the mood of the writer and gives the documents an overall score based on the sum of negative words less positive words. Accounts that have an alarming difference in sentiment compared to the baseline document will be flagged and added to a list for an analysis in rtweet and botornot. The botrnot package is used to predict if an account is a robot or human.

Results or Solution Evaluation

Overall, the plan of action was a complete failure. Baseline documents, that were from reputable sources, could not help in determining the sentiment of a tweet. There are several reasons this occurred. File sanitation included stemming words and removing hashtags. Stemming the words took many of them down to their core. During the sentiment analysis, manual comparisons were performed so that it could be sanity checked against the computational results. In many cases, over half of the words did not register to the machine due to stemming. Manual manipulation of files was required to correct this.

Hashtags often contained multiple words that are grouped together so that they could reach a larger audience. In an example, the February 2018 school shooting in Parkland, Florida, was used to collect data. Users often incorporate hashtags in the sentences to make a point and reach a larger audience. For example, #leftwingterrorist would show up as leftwingterrorist after removing the hashtag. The word terrorist is included in the negative words sentiment list, but the comparison function isn’t intelligent enough to understand that it is dealing with 3 different words. This lead to miscalculations in the sentiment index. Manual corrections would require inspection of thousands of tweets from multiple users. The messages may contain a common theme but be as unique as the author in its delivery. It simply wasn’t possible to manually sanitize each tweet for analysis.

An issue that became clear during the analysis was the media of choice being delivered. While most people think of “140 characters” when they hear Twitter, there have been many improvements and changes in the last 5 years on the platform. Alternative forms of media delivery include image and video uploads. Recently, prerecorded video and images are outpacing typed words.

Several controversial tweets recovered contained multiple hashtags and an image. The hashtag simply served as means of spreading the message to a larger audience while the
image contained the message with embedded text. It was not possible to get a sentiment score for messages of this nature. The embedded messages were often hateful, misleading information with numerous words that would register a hit in the sentiment analyzer.

During the analysis, it was discovered that coded analyzers were unable to determine the author's tone in a message.

*EDITED*
The baseline document negative score was 0.04316.

The following tweet received a negative score of 0.04545:

*EDITED*

There was almost no difference in the score, yet the user uses the embedded image and hashtags to reach a larger audience. The baseline document is factual and without bias while the tweet contains a symbol associated with hate groups, as well as an image that is emotionally disturbing. While they are not polar opposites, they clearly are not targeting the same audience.

Many tweets of this nature used the hashtag to engage the subject yet steered away from the topic to push a separate agenda. This was confirmed by doing a word cloud of the tweets with #parkland ranging from February 15th, 2018 to March 15th, 2018.
*WORD CLOUD I MADE FROM TEXT MINING THE MOST COMMON WORDS USED IN TWEETS WITH #PARKLAND*

Most individuals would likely have difficulty identifying the topic correctly based on the word cloud.

*skipping to the lesson learned*

Any attempt at achieving success in this field would require a massive investment in several technologies. Word based sentiment keywords would be required to not only identify words, but phrases using those words. It may be possible to use an NLP (natural language processing) classifier to alter the score of a word based on the word(s) that follow it.

-I'M NOT SAVING THAT DROWNING BABY
-I'M NOT LETTING THAT BABY DROWN

Two completely different sentiments in the above statements, yet they net the same score. The usage of NLP may help researchers develop a method that allows them to develop a more concise list of words and the weight they carry by examining the words that proceed and follow them.

Another missing technology that is needed is text recognition within images. This needs to be incorporated to have any realistic chance at predicting the sentiment of a tweet. Additionally, speech to text would be required, so that all forms of communications could be intercepted for analysis. If any form of communication is excluded, the analysis should be considered unreliable and incomplete.

While the methodology was a complete failure, many lessons were learned. The biggest of which, is that no computer system can bypass human ingenuity. Simply rearranging the order of your words can manipulate or defeat most analyzers. Inversely, simply mandating a two-day waiting period before your first tweet would likely eliminate many trolling tweets. In addition, require a tweet within 5 days of signing up before releasing the account. This would eliminate an entity from making multiple accounts and storing them until they needed it.

While the goal of the analysis was to detect accounts of a divisive nature, it does appear as if Twitter have tools to accomplish this. One item of interest is that it appears foreign accounts are removed, making it impossible for someone outside of the organization to collect data for study. American based accounts are left untouched. These accounts had similarities; a high number of followers, high number of tweets, and heavy usage of imagery. Also, these accounts tend to have more retweets then original content. The user “*EDIT*” could potentially reach over 18,500,000 users at two levels based on the data collect on that account.

KilledbyKenne · Jul 15, 2018

I realize the video is a mesh of different people talking about multiple subjects thrown together to make you really think about things but it lost me when Rogan started talking about working for things that you want being a trap. Wtf is that shit? I'd understand if he was talking about people that work themselves to death 14 hours a day because they overextend themselves, but 8 hours a day plus commuting isn't over-working yourself. That's life.

Maybe there was a better message in the original clip that was lost in the edit. Idk

Khan · Jul 15, 2018

Very true. I have one 2 friends, like die for you friends. The rest of the people I can't relate with them especially with all these facebook carp. The questions I always here, "why is it you don't have any friends?" BECAUSE I DON'T NEED THEM LOL. Life is good When you live in solitude. You learn a lot about yourself and life!

aoc racer · Jul 15, 2018

At my job we hire high school students to learn job skills. It pleases me to learn when a handful of them tell me they don't use social media because they see those older than them obsessed with their phones. One of them goes so far as to not have a smart phone. I hope this newer young generation that isn't considered millennials (I think?) follow the path more.

_Snake_ · Jul 15, 2018

musclefan21 said:
My only issue is, I go on dating apps because I m single. I feel like that’s the only way to meet ladies nowadays

Nothing wrong with that IMO. It’s how my wife and I met.

Stang Lover · Jul 15, 2018

72SBC said:
Only social media I’m on is forums. If your friends or family don’t know how to get ahold of you, they either don’t want to or you’re better off not staying in contact.

If this ain't the freaking truth! Two of my buddies literally live 4 streets down from me and I can call or text one of them no response unless it's convenient for him. If I comment on his FB post or instant message he'll respond and I live less than 5mins away from him. The other one will will just ask about me and they live on the same street. Ain't that something?

I have a facebook but mainly for buying car parts or learning about the car itself and look at pretty firearms. Can catch some sweet deals on there when dealers post it up. That's all on my feed besides people I know or enthusiasts. I don't have any pics of myself on there just my car.

5 Minutes of your time. It's worth it.

jsd512

Wannabe Ghostbuster

mnewxcv

Well-Known Member

KingBlack

I'm more stupid than I post

KilledbyKenne

Well-Known Member

Khan

BANNED STOCK BRAKE PADS GURU

aoc racer

Ford is hella cool

_Snake_

Well-Known Member

Stang Lover

Well-Known Member

Users who are viewing this thread