| 1 | == Overview to GIMS Capture Daemon == |
| 2 | |
| 3 | There are two basic components to the capture node software: |
| 4 | capture-daemon, and capd_proxy.py. The proxy handles all XML/RPC |
| 5 | communication with the GIMS AM and UI, and manages starting any |
| 6 | necessary capture processes and resulting capture files. |
| 7 | |
| 8 | Below, specific description of how to install and get the capture daemon proxy running are given. |
| 9 | |
| 10 | '''capture-daemon''' |
| 11 | |
| 12 | ---- |
| 13 | |
| 14 | The capture daemon can be run as a standalone process, but is normally |
| 15 | started up by the capd_proxy. capd_proxy should run in an |
| 16 | unprivileged mode, thus capture-daemon should be installed as setuid |
| 17 | root (or a privileged user that has appropriate access to network |
| 18 | devices via libpcap) in order for it to have appropriate permissions |
| 19 | to open capture devices, etc. It can be installed anywhere, but should be put |
| 20 | on a file system with reasonable enough space for temporary measurement |
| 21 | files to reside. |
| 22 | |
| 23 | The present capabilities of the capture daemon with respect to packet transformations |
| 24 | are: |
| 25 | * prefix-preserving IP address anonymization |
| 26 | * sampling (1-in-N or uniform probabilistic sampling) |
| 27 | * aggregation into either simple byte/packet counts, or flow-records. By default, no |
| 28 | aggregation is done and you'll get libpcap files with raw packet headers. |
| 29 | |
| 30 | Separate capture-daemon processes will be started by capd_proxy for |
| 31 | each active experiment. While there is some overhead involved with |
| 32 | having multiple capture processes running at the same time, this |
| 33 | architecture simplifies the problem of demultiplexing the appropriate |
| 34 | packets for a given experiment. It also provides a level of isolation |
| 35 | among experiments, and enables a separate O&M capture process to be |
| 36 | running alongside any active experiment process (e.g., in order to |
| 37 | capture *all* traffic on the wire). |
| 38 | |
| 39 | The only software dependency for the basic functionality of |
| 40 | capture-daemon is a working libpcap library. |
| 41 | |
| 42 | For flow aggregation in IPFIX records, libyaf and libfixbuf are |
| 43 | required. They are available at |
| 44 | http://tools.netsa.cert.org/index.html. These tools should be |
| 45 | compiled and installed prior to compiling the capture-daemon code. |
| 46 | They have their own set of prerequisites, notably glib 2.6.4 or |
| 47 | better. See the yaf/fixbuf documentation for more details. |
| 48 | |
| 49 | From inside the capture-daemon subdirectory, configure and make, then |
| 50 | sudo make install. (Make install just changes the capture-daemon |
| 51 | binary to be suid root so that it can have permissions to set a net |
| 52 | device in promiscuous mode. Alternatively, run capd_proxy.py as root.) |
| 53 | |
| 54 | |
| 55 | '''capd_proxy.py''' |
| 56 | |
| 57 | ---- |
| 58 | |
| 59 | Requires python 2.6. |
| 60 | |
| 61 | The capd_proxy handles all XML/RPC functions for configuring, |
| 62 | starting, and stopping experiments. There are also interfaces for |
| 63 | testing storage capabilities for an experiment, and for gathering |
| 64 | information on running experiments. |
| 65 | |
| 66 | Presently, storage functionality is included for: |
| 67 | * sftp |
| 68 | * Amazon S3 |
| 69 | |
| 70 | All storage functionality resides in capd_storage.py. It is designed in |
| 71 | a fashion to allow addition of new storage repositories relatively easy. |
| 72 | |
| 73 | From the capture daemon proxy, a separate process is started up to |
| 74 | handle each storage type (s3, ssh, and local storage). This storage |
| 75 | process handles checking for new files that can be uploaded, and also |
| 76 | annotates the existing metadata for raw capture files prior to upload. |
| 77 | Each individual file transfer is handled in a transient (Python) |
| 78 | thread in order to avoid blocking the entire process on a single |
| 79 | transfer. (Note that some work is yet to be done on ensuring the |
| 80 | propagation of errors from storage uploads to the UI (and |
| 81 | experimenters).) Storage functionality is almost entirely housed in |
| 82 | the capd_storage.py module. |
| 83 | |
| 84 | The software dependencies for capd_proxy are related to the storage |
| 85 | capabilities: the boto Python library is required for s3, and paramiko |
| 86 | and pyCrypto are required for ssh. Each of these libraries can be |
| 87 | quite easily installed (see depend subdirectory), but capd_proxy will |
| 88 | successfully start up even if they are not present (you simply won't |
| 89 | have the affected storage capabilities). (On debian linuxes, just do: |
| 90 | apt-get install python-paramiko python-crypto python-boto.) |
| 91 | |
| 92 | There are a few options to start up capd_proxy.py (python |
| 93 | capd_proxy.py -h will show them). If capture-daemon is suid root or |
| 94 | capd_proxy.py is running as root, you should simply be able to say |
| 95 | "python capd_proxy.py" to get started. The output logging will |
| 96 | immediately show what storage capabilities have been found (via |
| 97 | installed python libraries). A simple script (runproxy.sh) is |
| 98 | supplied to do a basic startup of the proxy. This will cause the |
| 99 | proxy to listen on any locally configured IP address and port 8001. |