API Documentation

This is the API documentation for ThreatIngestor.

threatingestor

class threatingestor.Ingestor(config_file)

Bases: object

ThreatIngestor main work logic.

Handles reading the config file, calling sources, maintaining state, and sending artifacts to operators.

run()

Run once, or forever, depending on config.

run_once()

Run each source once, passing artifacts to each operator.

run_forever()

Run forever, sleeping for the configured interval between each run.

threatingestor.artifact_types(artifact_list)

Return a dictionary with counts of each artifact type.

threatingestor.main()

CLI entry point, uses sys.argv directly.

artifacts

class threatingestor.artifacts.Artifact(artifact, source_name, reference_link=None, reference_text=None)

Bases: object

Artifact base class.

match(pattern)

Return True if regex pattern matches the deobfuscated artifact, else False.

May be overridden or extended by child classes.

format_message(message, **kwargs)

Allow string interpolation with artifact contents.

Optionally extend in child classes to add support for more specific interpolations.

Supported variables:

  • {artifact}
  • {reference_text}
  • {reference_link}
class threatingestor.artifacts.URL(artifact, source_name, reference_link=None, reference_text=None)

Bases: threatingestor.artifacts.Artifact

URL artifact abstraction, unicode-safe.

match(pattern)

Filter on some predefined conditions or a regex pattern.

If pattern can be parsed as one of the conditions below, it returns the truthiness of the resulting expression; otherwise it is treated as regex.

Valid conditions:

  • is_obfuscated
  • is_ipv4
  • is_ipv6
  • is_ip
  • is_domain
  • not {any above condition}
  • {any comma-separated list of above conditions}

For example:

  • is_obfuscated, not is_ip
  • not is_obfuscated, is_domain
format_message(message, **kwargs)

Allow string interpolation with artifact contents.

Supported variables:

  • {url}
  • {defanged}
  • {domain}
  • All supported variables from Artifact.format_message
is_obfuscated()

Boolean: is an obfuscated URL?

is_ipv4()

Boolean: URL network location is an IPv4 address, not a domain?

is_ipv6()

Boolean: URL network location is an IPv6 address, not a domain?

is_ip()

Boolean: URL network location is an IP address, not a domain?

domain()

Deobfuscated domain; undefined behavior if self.is_ip().

is_domain()

Boolean: URL network location might be a valid domain?

deobfuscated()

Named method for clarity, same as str(my_url_object).

class threatingestor.artifacts.IPAddress(artifact, source_name, reference_link=None, reference_text=None)

Bases: threatingestor.artifacts.Artifact

IP address artifact abstraction.

Use version and ipaddress() for processing.

format_message(message, **kwargs)

Allow string interpolation with artifact contents.

Supported variables:

  • {ipaddress}
  • {defanged}
  • All supported variables from Artifact.format_message
version

Returns 4, 6, or None.

ipaddress()

Return ipaddress.IPv4Address or ipaddress.IPv6Address object, or raise ValueError.

class threatingestor.artifacts.Domain(artifact, source_name, reference_link=None, reference_text=None)

Bases: threatingestor.artifacts.Artifact

Domain artifact abstraction

format_message(message, **kwargs)

Allow string interpolation with artifact contents.

Supported variables:

  • {domain}
  • {defanged}
  • All supported variables from Artifact.format_message
class threatingestor.artifacts.Hash(artifact, source_name, reference_link=None, reference_text=None)

Bases: threatingestor.artifacts.Artifact

Hash artifact abstraction.

MD5 = 'md5'
SHA1 = 'sha1'
SHA256 = 'sha256'
SHA512 = 'sha512'
format_message(message, **kwargs)

Allow string interpolation with artifact contents.

Supported variables:

  • {hash}
  • {hash_type}
  • All supported variables from Artifact.format_message
hash_type()

Return the hash type as a string, or None.

class threatingestor.artifacts.YARASignature(artifact, source_name, reference_link=None, reference_text=None)

Bases: threatingestor.artifacts.Artifact

YARA signature artifact abstraction.

format_message(message, **kwargs)

Allow string interpolation with artifact contents.

Supported variables:

  • {yarasignature}
  • All supported variables from Artifact.format_message
class threatingestor.artifacts.Task(artifact, source_name, reference_link=None, reference_text=None)

Bases: threatingestor.artifacts.Artifact

Generic Task artifact abstraction.

format_message(message, **kwargs)

Allow string interpolation with artifact contents.

Supported variables:

  • {task}
  • All supported variables from Artifact.format_message

config

class threatingestor.config.Config(filename)

Bases: object

Config read/write operations, and convenience methods.

daemon()

Returns boolean, are we daemonizing?

state_path()

Returns path of state.db file.

sleep()

Returns number of seconds to sleep between iterations, if daemonizing.

statsd()

Returns statsd config dictionary.

notifiers()

Returns notifiers config dictionary.

logging()

Returns logging config dictionary.

credentials(credential_name)

Return a dictionary with the specified credentials.

sources()

Return a list of (name, Source class, {kwargs}) tuples.

Raises:threatingestor.exceptions.PluginError
operators()

Return a list of (name, Operator class, {kwargs}) tuples.

Raises:threatingestor.exceptions.PluginError
whitelists()

Returns whitelist list.

exceptions

exception threatingestor.exceptions.IngestorError

Bases: Exception

Base exception class.

exception threatingestor.exceptions.DependencyError

Bases: threatingestor.exceptions.IngestorError

Missing dependency.

exception threatingestor.exceptions.PluginError

Bases: threatingestor.exceptions.IngestorError

Missing plugin.

state

class threatingestor.state.State(dbname)

Bases: object

State DB management.

save_state(name, state)

Create or update a state record.

get_state(name)

Return the state string for a given plugin.

operators

class threatingestor.operators.Operator(artifact_types=None, filter_string=None, allowed_sources=None)

Bases: object

Base class for all Operator plugins.

Note: This is an abstract class. You must extend __init__ and call super to ensure this class’s constructor is called. You must override handle_artifact with the same signature. You may define additional handle_{artifact_type} methods as needed (see the threatkb operator for an example) - these methods are purely convention, and are not required.

When adding additional methods to child classes, consider prefixing the method name with an underscore to denote a _private_method. Do not override other existing methods from this class.

handle_artifact(artifact)

Override with the same signature.

Parameters:artifact – A single Artifact object.
Returns:None (always ignored)
process(artifacts)

Process all applicable artifacts.

abstract_json

class threatingestor.operators.abstract_json.AbstractPlugin(artifact_types=None, filter_string=None, allowed_sources=None, **kwargs)

Bases: threatingestor.operators.Operator

Operator for Abstract JSON

handle_artifact(artifact)

Operate on a single artifact

beanstalk

class threatingestor.operators.beanstalk.Plugin(host, port, queue_name, artifact_types=None, filter_string=None, allowed_sources=None, **kwargs)

Bases: threatingestor.operators.abstract_json.AbstractPlugin

Operator for Beanstalk work queue.

csv

class threatingestor.operators.csv.Plugin(filename, artifact_types=None, filter_string=None, allowed_sources=None)

Bases: threatingestor.operators.Operator

Operator for output to flat CSV file.

handle_artifact(artifact)

Operate on a single artifact.

misp

class threatingestor.operators.misp.Plugin(url, key, ssl=True, tags=None, artifact_types=None, filter_string=None, allowed_sources=None)

Bases: threatingestor.operators.Operator

Operator for MISP.

handle_artifact(artifact)

Operate on a single artifact.

handle_domain(domain, event: pymisp.mispevent.MISPEvent)

Handle a single domain.

handle_hash(hash_, event: pymisp.mispevent.MISPEvent)

Handle a single hash.

handle_ipaddress(ipaddress, event: pymisp.mispevent.MISPEvent)

Handle a single IP address.

handle_url(url, event: pymisp.mispevent.MISPEvent)

Handle a single URL.

handle_yarasignature(yarasignature, event: pymisp.mispevent.MISPEvent)

Handle a single YARA signature.

sqlite

class threatingestor.operators.sqlite.Plugin(filename, artifact_types=None, filter_string=None, allowed_sources=None)

Bases: threatingestor.operators.Operator

Operator for SQLite3.

handle_artifact(artifact)

Operate on a single artifact.

sqs

class threatingestor.operators.sqs.Plugin(aws_access_key_id, aws_secret_access_key, aws_region, queue_name, artifact_types=None, filter_string=None, allowed_sources=None, **kwargs)

Bases: threatingestor.operators.abstract_json.AbstractPlugin

Operator for Amazon SQS.

threatkb

class threatingestor.operators.threatkb.Plugin(url, token, secret_key, state, artifact_types=None, filter_string=None, allowed_sources=None, use_https=False)

Bases: threatingestor.operators.Operator

Operator for InQuest ThreatKB.

handle_artifact(artifact)

Operate on a single artifact.

handle_domain(domain)

Handle a single domain.

handle_ipaddress(ipaddress)

Handle a single IP address.

handle_yarasignature(yarasignature)

Handle a single YARA signature.

handle_task(task)

Handle a single Task.

sources

class threatingestor.sources.Source(name, *args, **kwargs)

Bases: object

Base class for all Source plugins.

Note: This is an abstract class. You must override __init__ and run in child classes. You should not override process_element. When adding additional methods to child classes, consider prefixing the method name with an underscore to denote a _private_method.

run(saved_state)

Run and return (saved_state, list(Artifact)).

Override this method in child classes.

The method signature and return values must remain consistent.

The method should attempt to pick up where we left off using saved_state, if supported. If saved_state is None, you can assume this is a first run. If state is maintained by the remote resource (e.g. as it is with SQS), saved_state should always be None.

process_element(content, reference_link, include_nonobfuscated=False)

Take a single source content/url and return a list of Artifacts.

This is the main work block of Source plugins, which handles IOC extraction and artifact creation.

Parameters:
  • content – String content to extract from.
  • reference_link – Reference link to attach to all artifacts.
  • include_nonobfuscated – Include non-defanged URLs in output?

abstract_json

class threatingestor.sources.abstract_json.AbstractPlugin(name, paths, reference=None, **kwargs)

Bases: threatingestor.sources.Source

get_objects(saved_state)

Produce an iterable of dict or list objectcs containing raw content to process.

Override in child class.

run(saved_state)

Run and return (saved_state, list(Artifact))

beanstalk

class threatingestor.sources.beanstalk.Plugin(name, host, port, queue_name, paths, reference=None)

Bases: threatingestor.sources.abstract_json.AbstractPlugin

Source for Beanstalk work queue.

get_objects(saved_state)

Produce an iterable of dict or list objectcs containing raw content to process.

Override in child class.

git

class threatingestor.sources.git.Plugin(name, url, local_path)

Bases: threatingestor.sources.Source

run(saved_state)

Run and return (saved_state, list(Artifact)).

Override this method in child classes.

The method signature and return values must remain consistent.

The method should attempt to pick up where we left off using saved_state, if supported. If saved_state is None, you can assume this is a first run. If state is maintained by the remote resource (e.g. as it is with SQS), saved_state should always be None.

github

class threatingestor.sources.github.Plugin(name, search, username='', token='')

Bases: threatingestor.sources.Source

Github Source Plugin

run(saved_state)

Returns a list of artifacts and the saved state

rss

class threatingestor.sources.rss.Plugin(name, url, feed_type)

Bases: threatingestor.sources.Source

run(saved_state)

Run and return (saved_state, list(Artifact)).

Override this method in child classes.

The method signature and return values must remain consistent.

The method should attempt to pick up where we left off using saved_state, if supported. If saved_state is None, you can assume this is a first run. If state is maintained by the remote resource (e.g. as it is with SQS), saved_state should always be None.

sqs

class threatingestor.sources.sqs.Plugin(name, aws_access_key_id, aws_secret_access_key, aws_region, queue_name, paths, reference=None)

Bases: threatingestor.sources.abstract_json.AbstractPlugin

Source for Amazon SQS.

get_objects(saved_state)

Produce an iterable of dict or list objectcs containing raw content to process.

Override in child class.

twitter

class threatingestor.sources.twitter.Plugin(name, api_key, api_secret_key, access_token, access_token_secret, defanged_only=True, **kwargs)

Bases: threatingestor.sources.Source

run(saved_state)

Run and return (saved_state, list(Artifact)).

Override this method in child classes.

The method signature and return values must remain consistent.

The method should attempt to pick up where we left off using saved_state, if supported. If saved_state is None, you can assume this is a first run. If state is maintained by the remote resource (e.g. as it is with SQS), saved_state should always be None.

web

class threatingestor.sources.web.Plugin(name, url)

Bases: threatingestor.sources.Source

run(saved_state)

Run and return (saved_state, list(Artifact)).

Override this method in child classes.

The method signature and return values must remain consistent.

The method should attempt to pick up where we left off using saved_state, if supported. If saved_state is None, you can assume this is a first run. If state is maintained by the remote resource (e.g. as it is with SQS), saved_state should always be None.

extras

webapp

threatingestor.extras.webapp.list_view(table)
threatingestor.extras.webapp.html_view(table)

queueworker

class threatingestor.extras.queueworker.QueueWorker

Bases: object

Abstract base class for Queue Workers.

Override do_work to implement child classes.

read_config(filename)

Read from a config YAML file and set up queues.

run_forever()

Do work forever.

Note: There is no sleep here! If anything fails, it might spin out of control.

do_work(job)

Implement in child class.

One or zero jobs are passed in from the queue. One or zero jobs are returned and sent to the queue.

class threatingestor.extras.queueworker.SQSInterface(aws_access_key_id, aws_secret_access_key, aws_region, in_queue=None, out_queue=None)

Bases: object

A consistent Queue interface for SQS.

read_one()

Read one or zero messages from the queue.

Returns:Message body or None.
write_one(content)

Write one message to the queue, if it exists.

class threatingestor.extras.queueworker.BeanstalkInterface(host, port, in_queue=None, out_queue=None)

Bases: object

A consistent Queue interface for Beanstalk.

read_one()

Read one or zero messages from the queue.

Returns:Message body or None.
write_one(content)

Write one message to the queue, if it exists.

fswatcher

class threatingestor.extras.fswatcher.FSWatcher(patterns=None, ignore_patterns=None, ignore_directories=False, case_sensitive=False)

Bases: watchdog.events.PatternMatchingEventHandler, threatingestor.extras.queueworker.QueueWorker

Watch a directory for YARA rule changes.

Send contents of the changed rule files to the queue.

patterns = ['*.yar', '*.yara', '*.rule', '*.rules']
process(event)

Handle a file event.

on_modified(event)

Called when a file or directory is modified.

Parameters:event (DirModifiedEvent or FileModifiedEvent) – Event representing file/directory modification.
on_created(event)

Called when a file or directory is created.

Parameters:event (DirCreatedEvent or FileCreatedEvent) – Event representing file/directory creation.

pasteprocessor

class threatingestor.extras.pasteprocessor.PasteProcessor

Bases: threatingestor.extras.queueworker.QueueWorker

Read pastebin URLs from a queue, write raw content to a queue.

do_work(job)

From a paste URL, get the raw contents.