Stem Docs

Descriptor Reader

Descriptor Reader

Utilities for reading descriptors from local directories and archives. This is mostly done through the DescriptorReader class, which is an iterator for the descriptor data in a series of destinations. For example…

my_descriptors = [
  '/tmp/server-descriptors-2012-03.tar.bz2',
  '/tmp/archived_descriptors/',
]

# prints the contents of all the descriptor files
with DescriptorReader(my_descriptors) as reader:
  for descriptor in reader:
    print descriptor

This ignores files that cannot be processed due to read errors or unparsable content. To be notified of skipped files you can register a listener with register_skip_listener().

The DescriptorReader keeps track of the last modified timestamps for descriptor files that it has read so it can skip unchanged files if run again. This listing of processed files can also be persisted and applied to other DescriptorReader instances. For example, the following prints descriptors as they’re changed over the course of a minute, and picks up where it left off if run again…

reader = DescriptorReader(['/tmp/descriptor_data'])

try:
  processed_files = load_processed_files('/tmp/used_descriptors')
  reader.set_processed_files(processed_files)
except: pass # could not load, maybe this is the first run

start_time = time.time()

while (time.time() - start_time) < 60:
  # prints any descriptors that have changed since last checked
  with reader:
    for descriptor in reader:
      print descriptor

  time.sleep(1)

save_processed_files('/tmp/used_descriptors', reader.get_processed_files())

Module Overview:

load_processed_files - Loads a listing of processed files
save_processed_files - Saves a listing of processed files

DescriptorReader - Iterator for descriptor data on the local file system
  |- get_processed_files - provides the listing of files that we've processed
  |- set_processed_files - sets our tracking of the files we have processed
  |- register_read_listener - adds a listener for when files are read
  |- register_skip_listener - adds a listener that's notified of skipped files
  |- start - begins reading descriptor data
  |- stop - stops reading descriptor data
  |- __enter__ / __exit__ - manages the descriptor reader thread in the context
  +- __iter__ - iterates over descriptor data in unread files

FileSkipped - Base exception for a file that was skipped
  |- AlreadyRead - We've already read a file with this last modified timestamp
  |- ParsingFailure - Contents can't be parsed as descriptor data
  |- UnrecognizedType - File extension indicates non-descriptor data
  +- ReadFailed - Wraps an error that was raised while reading the file
     +- FileMissing - File does not exist

Deprecated since version 1.8.0: This module will likely be removed in Stem 2.0 due to lack of usage. If you use this modle please let me know.