Generating Alerts from Twitter

Note

If you’d like to run through this tutorial more quickly, be sure you’ve loaded Cyohon’s starter settings before you begin. These settings provide many of the configurations used in this tutorial.

Introduction

Suppose we want to capture a tweet like this:

The raw data for this tweet might look like this:

{
    "created_at": "Wed Apr 05 19:06:12 +0000 2017",
    "id": 849699711376293888,
    "id_str": "849699711376293888",
    "text": "Cyphon Release 1.0.0   https://t.co/Nn4Yo6ba6m",
    "truncated": false,
    "entities": {
        "hashtags": [],
        "symbols": [],
        "user_mentions": [],
        "urls": [{
            "url": "https://t.co/Nn4Yo6ba6m",
            "expanded_url": "https://www.cyphon.io/cyphon-blog/2017/4/5/release-100-is-available-now",
            "display_url": "cyphon.io/cyphon-blog/20\u2026",
            "indices": [23, 46]
        }]
    },
    "source": "\u003ca href=\"http://www.squarespace.com\" rel=\"nofollow\"\u003eSquarespace\u003c/a\u003e",
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": null,
    "in_reply_to_user_id": null,
    "in_reply_to_user_id_str": null,
    "in_reply_to_screen_name": null,
    "user": {
        "id": 834789750624116736,
        "id_str": "834789750624116736",
        "name": "The Cyphon Project",
        "screen_name": "cyphonproject",
        "location": "Maryland, USA",
        "description": "An open platform for incident management and response. Maintained by @DunbarCyber.",
        "url": "https://t.co/LJgJSY1cwl",
        "entities": {
            "url": {
                "urls": [{
                    "url": "https://t.co/LJgJSY1cwl",
                    "expanded_url": "https://cyphon.io",
                    "display_url": "cyphon.io",
                    "indices": [0, 23]
                }]
            },
            "description": {
                "urls": []
            }
        },
        "protected": false,
        "followers_count": 20,
        "friends_count": 3,
        "listed_count": 0,
        "created_at": "Thu Feb 23 15:39:21 +0000 2017",
        "favourites_count": 1,
        "utc_offset": null,
        "time_zone": null,
        "geo_enabled": false,
        "verified": false,
        "statuses_count": 9,
        "lang": "en",
        "contributors_enabled": false,
        "is_translator": false,
        "is_translation_enabled": false,
        "profile_background_color": "F5F8FA",
        "profile_background_image_url": null,
        "profile_background_image_url_https": null,
        "profile_background_tile": false,
        "profile_image_url": "http://pbs.twimg.com/profile_images/872533195794903041/ZH7OQlO3_normal.jpg",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/872533195794903041/ZH7OQlO3_normal.jpg",
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/834789750624116736/1490709818",
        "profile_link_color": "1DA1F2",
        "profile_sidebar_border_color": "C0DEED",
        "profile_sidebar_fill_color": "DDEEF6",
        "profile_text_color": "333333",
        "profile_use_background_image": true,
        "has_extended_profile": false,
        "default_profile": true,
        "default_profile_image": false,
        "following": true,
        "follow_request_sent": false,
        "notifications": false,
        "translator_type": "none"
    },
    "geo": {
        "type": "Point",
        "coordinates": [39.489207, -76.661301]
    },
    "coordinates": {
        "type": "Point",
        "coordinates": [-76.661301, 39.489207]
    },
    "place": {
        "id": "014f4cd64f60daea",
        "url": "https:\/\/api.twitter.com\/1.1\/geo\/id\/014f4cd64f60daea.json",
        "place_type": "city",
        "name": "Cockeysville",
        "full_name": "Cockeysville, MD",
        "country_code": "US",
        "country": "United States",
        "contained_within": [],
        "bounding_box": {
            "type": "Polygon",
            "coordinates": [
                [
                    [-76.6729123, 39.453647],
                    [-76.58212, 39.453647],
                    [-76.58212, 39.534252],
                    [-76.6729123, 39.534252]
                ]
            ]
        },
        "attributes": {}
    },
    "contributors": null,
    "is_quote_status": false,
    "retweet_count": 2,
    "favorite_count": 2,
    "favorited": true,
    "retweeted": true,
    "possibly_sensitive": false,
    "possibly_sensitive_appealable": false,
    "lang": "en"
}

Yikes! We’re only interested in a fraction of this information, so we’d like to condense all this into the following format:

{
    "created_date": "2017-04-16T14:05:17.000Z",
    "link": "https://twitter.com/cyphonproject/status/849699711376293888",
    "location": [-76.661301, 39.489207],
    "ref_id": "849699711376293888",
    "text": "Cyphon Release 1.0.0   https://t.co/Nn4Yo6ba6m",
    "user": {
        "link": "https://twitter.com/cyphonproject",
        "name": "The Cyphon Project",
        "profile_pic": "http://pbs.twimg.com/profile_images/872533195794903041/ZH7OQlO3_normal.jpg",
        "ref_id": "834789750624116736",
        "screen_name": "cyphonproject"
    }
}

We also want to analyze the tweet to determine whether it reflects a positive, negative, or neutral sentiment. If it’s negative, we want to be alerted to that fact.

In this tutorial, we’ll show you how to use Cyphon’s DataChutes to parse and save tweets in a condensed form, perform a sentiment analyses on them, and generate Alerts for tweets with negative sentiments.

Step 1: Container

We’ll start by creating a Container to store the condensed tweet. The Container will consist of a Bottle and a Label.

Bottle

../_images/bottle1.png

Open the “Shaping Data” panel on Cyphon’s main admin page, and create the following BottleFields:

Field Name Field Type Target Type
created_date DateTimeField DateTime
link URLField  
location PointField Location
name CharField Account
profile_pic URLField  
ref_id CharField  
screen_name CharField Account
text TextField Keyword

Now we’ll create two Bottles (one will be nested within the other). The first Bottle will contain data specific to the user who posted the tweet.

Using the BottleFields we just created, add these fields to a Bottle called “user”:

  • link
  • name
  • profile_pic
  • ref_id
  • screen_name

Now we’ll work on the Bottle for the tweet itself. Create a Bottle called “post” and add the following BottleFields:

  • created_date
  • link
  • location
  • ref_id
  • text

Finally, we need to embed the “user” Bottle within the “post” Bottle. To do this, we’ll create a special BottleField that refers to the “user” Bottle. You can create this new BottleField from the admin page of the “post” Bottle by clicking the “+” sign by the list of fields.

The new BottleField should have the following properties:

BottleField.field_name user
BottleField.field_type EmbeddedDocument
BottleField.target_type  
BottleField.embedded_doc user

Confirm that this new BottleField was added to the “post” Bottle and save the Bottle.

Protocol

../_images/protocol.png

Next, we’ll need to create a couple of Protocols that will perform the sentiment analyses, if they don’t already exist.

Under “App Configurations,” click “Protocols,” and create two new Protocols with the following properties:

Name Package Module Function
sentiment sentiment sentiment get_sentiment
polarity sentiment sentiment get_polarity

These Protocols will analyze text using the get_sentiment() and get_polarity() functions in the sentiment module within the Lab package’s sentiment subpackage. The get_sentiment() function will determine whether text is “positive,” “negative,” or “neutral.” The get_polarity() function will assign a numerical value for polarity (e.g., 0.1, -0.3, etc.). This will allow us to distinguish between tweets that are mildy negative and those that are strongly negative.

Procedure

../_images/procedure2.png

Now we’ll specify which field of the Twitter data object we want to analyze with these Protocols.

Under “Enhancing Data,” create two new Procedures with the following properties:

Name Protocol Field Name
text_sentiment sentiment text
text_polarity polarity text

These Procedures will be used to analyze the “text” field in our “post” Bottle.

Label

../_images/label1.png

Next, we’ll create the Label that will contain the results of our analyses.

First we’ll create LabelFields that will store the results of the two Procedures we created. Under “Shaping Data,” create two new LabelFields with the following properties:

Field Name Field Type Analyzer
polarity FloatField text_polarity (procedure)
sentiment CharField text_sentiment (procedure)

Add the two LabelFields to a new Label called “post”.

Container

../_images/container1.png

We now have all the pieces to make a Container. Create a Container called “post” using the “post” Bottle and the “post” Label. The preview should look like this:

{
    "_metadata": {
        "polarity": "FloatField",
        "sentiment": "CharField"
    },
    "created_date": "DateTimeField (DateTime)",
    "link": "URLField",
    "location": "PointField (Location)",
    "ref_id": "CharField",
    "text": "TextField (Keyword)",
    "user": {
        "link": "URLField",
        "name": "CharField (Account)",
        "profile_pic": "URLField",
        "ref_id": "CharField",
        "screen_name": "CharField (Account)"
    }
}

Note that our LableFields are stored under the “_metadata” field.

Define a Taste for this Container with the following properties:

Taste.author user.screen_name
Taste.title text
taste.content text
taste.location location
taste.location_format Longitude, Latitude
taste.datetime created_date

Step 2: Condenser

../_images/condenser2.png

We now need to map the fields from the Twitter data to the fields of the Container we created. We can do this by creating a Condenser.

Under the “Condensing Data” panel on the main admin page, click “JSON Data,” and create the following DataParsers:

Name Source Field Method Formatter
coordinates.coordinates__COPY coordinates.coordinates Copy  
created_at__DATE created_at Date from string  
id_str__COPY id_str Copy  
text__COPY text Copy  
twitter__content_link user.screen_name, id_str Copy https://twitter.com/{}/statuses/{}
twitter__user_link user.screen_name Copy https://twitter.com/{}/
user.id_str__COPY user.id_str Copy  
user.name__COPY user.name Copy  
user.profile_image_url__COPY user.profile_image_url Copy  
user.screen_name__COPY user.screen_name Copy  

Next, we’ll create DataCondensers to map these DataParsers to the fields of the “post” Bottle and its nested “user” Bottle.

Create a DataCondenser called “twitter__user” and associate it with the “user” Bottle. Add the following DataFittings:

Target Field Parser (data parser)
link twitter__user_link (data parser)
name user.name__COPY (data parser)
profile_pic user.profile_image_url__COPY (data parser)
ref_id user.id_str__COPY (data parser)
screen_name user.screen_name__COPY (data parser)

Now create another DataCondenser called “twitter__post,” and associate it with the “post” Bottle. Add the following DataFittings:

Target Field Parser
created_date created_at__DATE (data parser)
link twitter__content_link (data parser)
location coordinates.coordinates__COPY (data parser)
ref_id id_str__COPY (data parser)
text text__COPY (data parser)
user twitter__user (data condenser)

Step 3: Distillery

../_images/distillery2.png

Next, we need to specify where the data will be stored. To do this, we’ll create a Distillery, which defines the data store and the data schema for the tweets. The data schema is represented by the Distillery’s Container (which we already created). The data store is represented by the Distillery’s Collection.

Before we create the Collection, we first need to establish a Warehouse for it. Open the “Storing Data” panel, and create a Warehouse with the following properties:

Warehouse.backend elasticsearch
Warehouse.name cyphon
Warehouse.time_series True

This will establish daily indexes in Elasticsearch with the pattern “cyphon-YYYY-MM-dd.”

Note

Time-series indexes make it easier to archive or delete old data. See Elasticsearch’s documentation on retiring data for more info.

Next, create a Collection in this Warehouse for social media posts:

Collection.warehouse cyphon
Collection.name social

This will create a “social” document type within the “cyphon” Elasticsearch index. (If you were using MongoDB as a backend instaad, it would create a collection called “social” in a database called “cyphon.”)

Under the “Distilling Data” panel, create a Distillery that uses this Collection and the “post” Container we previously created:

Distillery.collection elasticsearch.cyphon.social
Distillery.container post

Step 4: Munger

../_images/munger1.png

Now that we have a Condenser to process the data and a Distillery to store it, we can put them together to create a Munger.

Open the “Sifting Data” panel, and click on “JSON Data.” Create a DataMunger with the following properties:

DataMunger.name tweets
DataMunger.distillery elasticsearch.cyphon.social
DataMunger.condenser twitter__post

Step 5: Chute

../_images/chute1.png

Now that we hava a Munger, we can create a Chute to catch incoming tweets.

Create a DataChute with the following properties:

DataChute.sieve  
DataChute.munger tweets
DataChute.endpoint Twitter PublicStreamsAPI
DataChute.enabled True

We’re not associating a Sieve with this Chute, which means it will capture all incoming tweets and send them to the “Social Media” Munger. This Munger will parse the tweets and save them in the “elasticsearch.cyphon.social” Collection.

Note

If you’d only like to send a subset of tweets to a particular Munger, you can create a Chute with a Sieve that captures only the documents you want. Creating several Chutes with different Sieves and Mungers will allow you to save tweets on different topics to different Collections.

Step 6: Watchdog

../_images/watchdog2.png

Now that we’ve made a Chute to save the tweets, we need to create a Watchdog to generate Alerts.

Let’s suppose we want to generate an Alert whenever we get a negative tweet about a company called “Acme”. We’ll start by making some DataRules to identify these tweets.

Under the “Sifting Data” panel, click on “JSON Data” and create the following DataRules:

DataRule Name Field Name Operator Value
text_contains_acme text contains acme
sentiment_equals_negative _metadata.sentiment equals negative

We can then use these DataRules to create a DataSieve that a Watchdog will use to inspect the document:

DataSieve Name Logic Node 1 (data rule) Node 2 (data rule)
Negative Acme posts AND text_contains_acme sentiment_equals_negative

Now go to the “Configuring Alerts” panel and create the Watchdog:

Watchdog.name Acme
Watchdog.enabled True

Add the following Trigger to the Watchdog:

DataSieve Alert Level Rank
Negative Acme posts High 0

Step 7: Filter

../_images/filter1.png

Now that we’ve configured Cyphon to handle the tweets, we need to set up a Filter that will collect tweets about “Acme.”

Under the “Filtering Data” panel, create a SearchTerm for the term “acme”. Then create a Filter based on the SearchTerm you just added.

Step 8: Stream

../_images/stream1.png

Finally, we need to connect Cyphon to a Twitter stream, so it can start gathering tweets. To do this, follow the instructions under Twitter Setup. When you’re done, make sure the “twitter” Reservoir is enabled (under “App Configurations > Data Collection”).

You should now be receiving tweets! You can double-check this by clicking on “Streams” under the “Records” panel of the admin page. You should see a stream for “Twitter PublicStreamsAPI: Twitter”.

Conclusion

In this tutorial, we showed you how to collect, analyze, and generate Alerts from tweets. Now that you have a basic idea of how to filter tweets and set up rules, you can experiment with more advanced rulesets that generate high-, medium-, and low-level Alerts.