API Documentation¶
This is the API documentation for ThreatIngestor.
threatingestor¶
-
class
threatingestor.
Ingestor
(config_file)¶ Bases:
object
ThreatIngestor main work logic.
Handles reading the config file, calling sources, maintaining state, and sending artifacts to operators.
-
run
()¶ Run once, or forever, depending on config.
-
run_once
()¶ Run each source once, passing artifacts to each operator.
-
run_forever
()¶ Run forever, sleeping for the configured interval between each run.
-
-
threatingestor.
artifact_types
(artifact_list)¶ Return a dictionary with counts of each artifact type.
-
threatingestor.
main
()¶ CLI entry point, uses sys.argv directly.
artifacts¶
-
class
threatingestor.artifacts.
Artifact
(artifact, source_name, reference_link=None, reference_text=None)¶ Bases:
object
Artifact base class.
-
match
(pattern)¶ Return True if regex pattern matches the deobfuscated artifact, else False.
May be overridden or extended by child classes.
-
format_message
(message, **kwargs)¶ Allow string interpolation with artifact contents.
Optionally extend in child classes to add support for more specific interpolations.
Supported variables:
- {artifact}
- {reference_text}
- {reference_link}
-
-
class
threatingestor.artifacts.
URL
(artifact, source_name, reference_link=None, reference_text=None)¶ Bases:
threatingestor.artifacts.Artifact
URL artifact abstraction, unicode-safe.
-
match
(pattern)¶ Filter on some predefined conditions or a regex pattern.
If pattern can be parsed as one of the conditions below, it returns the truthiness of the resulting expression; otherwise it is treated as regex.
Valid conditions:
- is_obfuscated
- is_ipv4
- is_ipv6
- is_ip
- is_domain
- not {any above condition}
- {any comma-separated list of above conditions}
For example:
- is_obfuscated, not is_ip
- not is_obfuscated, is_domain
-
format_message
(message, **kwargs)¶ Allow string interpolation with artifact contents.
Supported variables:
- {url}
- {defanged}
- {domain}
- All supported variables from Artifact.format_message
-
is_obfuscated
()¶ Boolean: is an obfuscated URL?
-
is_ipv4
()¶ Boolean: URL network location is an IPv4 address, not a domain?
-
is_ipv6
()¶ Boolean: URL network location is an IPv6 address, not a domain?
-
is_ip
()¶ Boolean: URL network location is an IP address, not a domain?
-
domain
()¶ Deobfuscated domain; undefined behavior if self.is_ip().
-
is_domain
()¶ Boolean: URL network location might be a valid domain?
-
deobfuscated
()¶ Named method for clarity, same as str(my_url_object).
-
-
class
threatingestor.artifacts.
IPAddress
(artifact, source_name, reference_link=None, reference_text=None)¶ Bases:
threatingestor.artifacts.Artifact
IP address artifact abstraction.
Use version and ipaddress() for processing.
-
format_message
(message, **kwargs)¶ Allow string interpolation with artifact contents.
Supported variables:
- {ipaddress}
- {defanged}
- All supported variables from Artifact.format_message
-
version
¶ Returns 4, 6, or None.
-
ipaddress
()¶ Return ipaddress.IPv4Address or ipaddress.IPv6Address object, or raise ValueError.
-
-
class
threatingestor.artifacts.
Domain
(artifact, source_name, reference_link=None, reference_text=None)¶ Bases:
threatingestor.artifacts.Artifact
Domain artifact abstraction
-
format_message
(message, **kwargs)¶ Allow string interpolation with artifact contents.
Supported variables:
- {domain}
- {defanged}
- All supported variables from Artifact.format_message
-
-
class
threatingestor.artifacts.
Hash
(artifact, source_name, reference_link=None, reference_text=None)¶ Bases:
threatingestor.artifacts.Artifact
Hash artifact abstraction.
-
MD5
= 'md5'¶
-
SHA1
= 'sha1'¶
-
SHA256
= 'sha256'¶
-
SHA512
= 'sha512'¶
-
format_message
(message, **kwargs)¶ Allow string interpolation with artifact contents.
Supported variables:
- {hash}
- {hash_type}
- All supported variables from Artifact.format_message
-
hash_type
()¶ Return the hash type as a string, or None.
-
-
class
threatingestor.artifacts.
YARASignature
(artifact, source_name, reference_link=None, reference_text=None)¶ Bases:
threatingestor.artifacts.Artifact
YARA signature artifact abstraction.
-
format_message
(message, **kwargs)¶ Allow string interpolation with artifact contents.
Supported variables:
- {yarasignature}
- All supported variables from Artifact.format_message
-
-
class
threatingestor.artifacts.
Email
(artifact, source_name, reference_link=None, reference_text=None)¶ Bases:
threatingestor.artifacts.Artifact
Email artifact abstraction.
-
format_message
(message, **kwargs)¶ Allow string interpolation with artifact contents.
Supported variables:
- {email}
- All supported variables from Artifact.format_message
-
-
class
threatingestor.artifacts.
Task
(artifact, source_name, reference_link=None, reference_text=None)¶ Bases:
threatingestor.artifacts.Artifact
Generic Task artifact abstraction.
-
format_message
(message, **kwargs)¶ Allow string interpolation with artifact contents.
Supported variables:
- {task}
- All supported variables from Artifact.format_message
-
config¶
-
class
threatingestor.config.
Config
(filename)¶ Bases:
object
Config read/write operations, and convenience methods.
-
daemon
()¶ Returns boolean, are we daemonizing?
-
state_path
()¶ Returns path of state.db file.
-
sleep
()¶ Returns number of seconds to sleep between iterations, if daemonizing.
-
statsd
()¶ Returns statsd config dictionary.
-
notifiers
()¶ Returns notifiers config dictionary.
-
logging
()¶ Returns logging config dictionary.
-
credentials
(credential_name)¶ Return a dictionary with the specified credentials.
-
sources
()¶ Return a list of (name, Source class, {kwargs}) tuples.
Raises: threatingestor.exceptions.PluginError
-
operators
()¶ Return a list of (name, Operator class, {kwargs}) tuples.
Raises: threatingestor.exceptions.PluginError
-
whitelists
()¶ Returns whitelist list.
-
exceptions¶
-
exception
threatingestor.exceptions.
DependencyError
¶ Bases:
threatingestor.exceptions.IngestorError
Missing dependency.
-
exception
threatingestor.exceptions.
PluginError
¶ Bases:
threatingestor.exceptions.IngestorError
Missing plugin.
state¶
operators¶
-
class
threatingestor.operators.
Operator
(artifact_types=None, filter_string=None, allowed_sources=None)¶ Bases:
object
Base class for all Operator plugins.
Note: This is an abstract class. You must extend
__init__
and callsuper
to ensure this class’s constructor is called. You must overridehandle_artifact
with the same signature. You may define additionalhandle_{artifact_type}
methods as needed (see the threatkb operator for an example) - these methods are purely convention, and are not required.When adding additional methods to child classes, consider prefixing the method name with an underscore to denote a
_private_method
. Do not override other existing methods from this class.-
handle_artifact
(artifact)¶ Override with the same signature.
Parameters: artifact – A single Artifact
object.Returns: None (always ignored)
-
process
(artifacts)¶ Process all applicable artifacts.
-
abstract_json¶
-
class
threatingestor.operators.abstract_json.
AbstractPlugin
(artifact_types=None, filter_string=None, allowed_sources=None, **kwargs)¶ Bases:
threatingestor.operators.Operator
Operator for Abstract JSON
-
handle_artifact
(artifact)¶ Operate on a single artifact
-
beanstalk¶
-
class
threatingestor.operators.beanstalk.
Plugin
(host, port, queue_name, artifact_types=None, filter_string=None, allowed_sources=None, **kwargs)¶ Bases:
threatingestor.operators.abstract_json.AbstractPlugin
Operator for Beanstalk work queue.
csv¶
-
class
threatingestor.operators.csv.
Plugin
(filename, artifact_types=None, filter_string=None, allowed_sources=None)¶ Bases:
threatingestor.operators.Operator
Operator for output to flat CSV file.
-
handle_artifact
(artifact)¶ Operate on a single artifact.
-
misp¶
sqlite¶
-
class
threatingestor.operators.sqlite.
Plugin
(filename, artifact_types=None, filter_string=None, allowed_sources=None)¶ Bases:
threatingestor.operators.Operator
Operator for SQLite3.
-
handle_artifact
(artifact)¶ Operate on a single artifact.
-
sqs¶
-
class
threatingestor.operators.sqs.
Plugin
(aws_access_key_id, aws_secret_access_key, aws_region, queue_name, artifact_types=None, filter_string=None, allowed_sources=None, **kwargs)¶ Bases:
threatingestor.operators.abstract_json.AbstractPlugin
Operator for Amazon SQS.
threatkb¶
-
class
threatingestor.operators.threatkb.
Plugin
(url, token, secret_key, state, artifact_types=None, filter_string=None, allowed_sources=None, use_https=False)¶ Bases:
threatingestor.operators.Operator
Operator for InQuest ThreatKB.
-
handle_artifact
(artifact)¶ Operate on a single artifact.
-
handle_domain
(domain)¶ Handle a single domain.
-
handle_ipaddress
(ipaddress)¶ Handle a single IP address.
-
handle_yarasignature
(yarasignature)¶ Handle a single YARA signature.
-
handle_task
(task)¶ Handle a single Task.
-
sources¶
-
class
threatingestor.sources.
Source
(name, *args, **kwargs)¶ Bases:
object
Base class for all Source plugins.
Note: This is an abstract class. You must override
__init__
andrun
in child classes. You should not overrideprocess_element
. When adding additional methods to child classes, consider prefixing the method name with an underscore to denote a_private_method
.-
run
(saved_state)¶ Run and return
(saved_state, list(Artifact))
.Override this method in child classes.
The method signature and return values must remain consistent.
The method should attempt to pick up where we left off using
saved_state
, if supported. Ifsaved_state
isNone
, you can assume this is a first run. If state is maintained by the remote resource (e.g. as it is with SQS),saved_state
should always beNone
.
-
process_element
(content, reference_link, include_nonobfuscated=False)¶ Take a single source content/url and return a list of Artifacts.
This is the main work block of Source plugins, which handles IOC extraction and artifact creation.
Parameters: - content – String content to extract from.
- reference_link – Reference link to attach to all artifacts.
- include_nonobfuscated – Include non-defanged URLs in output?
-
abstract_json¶
-
class
threatingestor.sources.abstract_json.
AbstractPlugin
(name, paths, reference=None, **kwargs)¶ Bases:
threatingestor.sources.Source
-
get_objects
(saved_state)¶ Produce an iterable of dict or list objectcs containing raw content to process.
Override in child class.
-
run
(saved_state)¶ Run and return (saved_state, list(Artifact))
-
beanstalk¶
-
class
threatingestor.sources.beanstalk.
Plugin
(name, host, port, queue_name, paths, reference=None)¶ Bases:
threatingestor.sources.abstract_json.AbstractPlugin
Source for Beanstalk work queue.
-
get_objects
(saved_state)¶ Produce an iterable of dict or list objectcs containing raw content to process.
Override in child class.
-
git¶
-
class
threatingestor.sources.git.
Plugin
(name, url, local_path)¶ Bases:
threatingestor.sources.Source
-
run
(saved_state)¶ Run and return
(saved_state, list(Artifact))
.Override this method in child classes.
The method signature and return values must remain consistent.
The method should attempt to pick up where we left off using
saved_state
, if supported. Ifsaved_state
isNone
, you can assume this is a first run. If state is maintained by the remote resource (e.g. as it is with SQS),saved_state
should always beNone
.
-
github¶
-
class
threatingestor.sources.github.
Plugin
(name, search, num_of_days=10, username='', token='')¶ Bases:
threatingestor.sources.Source
Github Source Plugin
-
run
(saved_state)¶ Returns a list of artifacts and the saved state
-
github_gist¶
-
threatingestor.sources.github_gist.
user_set
(user)¶
-
class
threatingestor.sources.github_gist.
Plugin
(name, user='', username='', token='')¶ Bases:
threatingestor.sources.Source
Github Gist Source Plugin
-
run
(saved_state)¶ Returns a list of artifacts and the saved state
-
rss¶
sitemap ^^^
virustotal¶
-
class
threatingestor.sources.virustotal.
Plugin
(name, user, api_key, limit=10)¶ Bases:
threatingestor.sources.Source
VirusTotal Comments Source Plugin
-
run
(saved_state)¶ Returns a list of artifacts and the saved state
-
image ^^^
sqs¶
-
class
threatingestor.sources.sqs.
Plugin
(name, aws_access_key_id, aws_secret_access_key, aws_region, queue_name, paths, reference=None)¶ Bases:
threatingestor.sources.abstract_json.AbstractPlugin
Source for Amazon SQS.
-
get_objects
(saved_state)¶ Produce an iterable of dict or list objectcs containing raw content to process.
Override in child class.
-
twitter¶
-
threatingestor.sources.twitter.
tmp_name
()¶
-
class
threatingestor.sources.twitter.
Plugin
(name, api_key, api_secret_key, access_token, access_token_secret, defanged_only=True, **kwargs)¶ Bases:
threatingestor.sources.Source
-
run
(saved_state)¶ Run and return
(saved_state, list(Artifact))
.Override this method in child classes.
The method signature and return values must remain consistent.
The method should attempt to pick up where we left off using
saved_state
, if supported. Ifsaved_state
isNone
, you can assume this is a first run. If state is maintained by the remote resource (e.g. as it is with SQS),saved_state
should always beNone
.
-
web¶
-
class
threatingestor.sources.web.
Plugin
(name, url)¶ Bases:
threatingestor.sources.Source
-
run
(saved_state)¶ Run and return
(saved_state, list(Artifact))
.Override this method in child classes.
The method signature and return values must remain consistent.
The method should attempt to pick up where we left off using
saved_state
, if supported. Ifsaved_state
isNone
, you can assume this is a first run. If state is maintained by the remote resource (e.g. as it is with SQS),saved_state
should always beNone
.
-
extras¶
webapp¶
-
threatingestor.extras.webapp.
list_view
(table)¶
-
threatingestor.extras.webapp.
html_view
(table)¶
queueworker¶
-
class
threatingestor.extras.queueworker.
QueueWorker
¶ Bases:
object
Abstract base class for Queue Workers.
Override do_work to implement child classes.
-
read_config
(filename)¶ Read from a config YAML file and set up queues.
-
run_forever
()¶ Do work forever.
Note: There is no sleep here! If anything fails, it might spin out of control.
-
do_work
(job)¶ Implement in child class.
One or zero jobs are passed in from the queue. One or zero jobs are returned and sent to the queue.
-
-
class
threatingestor.extras.queueworker.
SQSInterface
(aws_access_key_id, aws_secret_access_key, aws_region, in_queue=None, out_queue=None)¶ Bases:
object
A consistent Queue interface for SQS.
-
read_one
()¶ Read one or zero messages from the queue.
Returns: Message body or None.
-
write_one
(content)¶ Write one message to the queue, if it exists.
-
-
class
threatingestor.extras.queueworker.
BeanstalkInterface
(host, port, in_queue=None, out_queue=None)¶ Bases:
object
A consistent Queue interface for Beanstalk.
-
read_one
()¶ Read one or zero messages from the queue.
Returns: Message body or None.
-
write_one
(content)¶ Write one message to the queue, if it exists.
-
fswatcher¶
-
class
threatingestor.extras.fswatcher.
FSWatcher
(patterns=None, ignore_patterns=None, ignore_directories=False, case_sensitive=False)¶ Bases:
watchdog.events.PatternMatchingEventHandler
,threatingestor.extras.queueworker.QueueWorker
Watch a directory for YARA rule changes.
Send contents of the changed rule files to the queue.
-
patterns
= ['*.yar', '*.yara', '*.rule', '*.rules']¶
-
process
(event)¶ Handle a file event.
-
on_modified
(event)¶ Called when a file or directory is modified.
Parameters: event ( DirModifiedEvent
orFileModifiedEvent
) – Event representing file/directory modification.
-
on_created
(event)¶ Called when a file or directory is created.
Parameters: event ( DirCreatedEvent
orFileCreatedEvent
) – Event representing file/directory creation.
-
pasteprocessor¶
-
class
threatingestor.extras.pasteprocessor.
PasteProcessor
¶ Bases:
threatingestor.extras.queueworker.QueueWorker
Read pastebin URLs from a queue, write raw content to a queue.
-
do_work
(job)¶ From a paste URL, get the raw contents.
-