Welcome to bulk_extractor!

To install on a Linux/MacOS/Mingw system, use:
   $ make
   $ sudo make install

The following directories will NOT be installed with the above commands:

    python/   - bulk_extractor python tools.
    	      	Copy them where you wish and run them directly. 
		These tools are experimental.

    plugins/  - This is for C/C++ developers only. You can develop your own
    	      	bulk_extractor plugins which will then be run at run-time
		if the .so or .dll files are in the same directory as
		the bulk_extractor executable.

	
================================================================
bulk_extractor produces the following kinds of outputs:


email.txt - All of the email addresses found.
  email_histogram - a histogram of the email addresses

url.txt - The extracted URLs. All URLs found in the source will be put into this.
  url_histogram - a histogram of the URLs
  url_searches - a histogram of the search terms extracted from the URLs
  url_services - a histogram of the domain names mentioned in the URLs.

tcp.txt - Evidence of all TCP connections, found with the TCP carver.
  tcp_histogram - a histogram of the TCP connections.


domain.txt - A list of all the hostnames extracted in the source. This
  includes bdomains extracted from email addresses, domains extracted
  from URLs, dotted-quad notations for IP addresses,


Proposed command set for BULK EXTREACTOR config file:


ignore email <pat>   	 Ignores an email pattern.
       	     		 Patterns may use "*" as a wildcard and "?" as a single-character match.
			 	 
			examples:  ignore email simsong@*
				   ignore email sim*@mit.edu


ignore emailre <reg>	 Ignores email addresses that match the regular expression <reg>

       	       		 examples: ignore emailre simsong@.*
			 	   ignore emailre sim.*@mit.edu

================================================================

Compiling bulk_extractor:
*************************
bulk_extractor builds with the GNU auto tools. The maintainer has
prevously run automake and autoconf to produce the script
"configure". This script *should* be able to compile bulk_extractor
for your platform. 

See INSTALL for general information on GNU auto tools.

The following libraries are optional. They will improve performance:

* pthreads - enables the thread pool (without it bulk_extractor runs
             in 'striping' mode, which is not as high performance).
* libewf   - for reading EnCase E01 evidence files
* afflib   - for reading AFF evidence files
* exiv2	   - for decoding JPEG Exifs.
* regex	   - Usually included with release
* openssldev - for crypto primitives (not required on Windows; we use CAPI)


*******************************
Compiling under Windows

There are three ways to compile for Windows:
1 - Cross-compiling from a Linux or Mac system with mingw.
2 - Compiling natively on Windows using mingw.
3 - Compiling natively on Windows using cygwin (untested)

Cross-compiling from Linux or Mac using MINGW:
*********************************************
* Cross-compiling works fine, but it does not include the version 4.x GCC compiler and pthreads does not appear to work properly.
* We used to install with mingw cross-compiling, but that created problems with multi-threading


Compiling natively under Windows with MINGW:
*******************************************

  Download the Windows Server 2003 Resource Kit tools from:
  http://www.microsoft.com/downloads/details.aspx?familyid=9d467a69-57ff-4ae7-96ee-b18c4790cffd&displaylang=en

  download and run mingw-get-inst-20101030.exe (or whatever version is current),
  selecting all options including these:

    C Compiler, C++ Compiler. MSYS Basic System, MinGW Development Toolkit.

  When selecting the installation path to MinGW, Do not define a path
  with spaces in it.

  Start the MinGW32 shell window.

  Download the latest repository catalog and update and install
  modules required by MinGW by typing the following:

  mingw-get update
  mingw-get install g++
  mingw-get install pthreads
  mingw-get install mingw32-make
  mingw-get install zlib
  mingw-get install libz-dev

  Install the libraries required by bulk_extractor and also install bulk_extractor in this order:
    * expat
    * openssl
    * libewf  (be sure to configure --enable-winapi=yes)
    * afflib
    * regex
    * bulk_extractor

  For each library:
   - download
   - ./configure --prefix=/usr/local/ --enable-winapi=yes
   - make
   - make install

   For openssl, run "./config --prefix=/usr/local" rather than configure.

   Don't make directories in your home directory if there is a space in it! 
   Libtool doesn't handle paths with spaces in them.

  If OpenSSL is installed in /usr/local/ssl, you may need to build
  other libraries with:  

  ./configure CPPFLAGS="-I/usr/local/include" -I/usr/local/ssl/include" \
              LDFLAGS="-L/usr/local/lib -L/usr/local/ssl/lib"

  Most libraries will install in /usr/local/ ; you may need to add
  -I/usr/local/include to CFLAGS and -L/usr/local/lib to your make
  scripts

  Still problematic, though, is actually running what is
  produced. Unless you link -static you will have a lot of DLL
  references. Most of the DLLs are installed in /usr/local/bin/*.dll
  and /bin/*.dll and elsewhere, which maps typically to
  c:\mingw\msys\1.0\local\bin and c:\mingw\bin\


To make the installer you will require WiX. Learn more about it at:
 * http://wix.codeplex.com/
 * http://www.dalun.com/wix/01.09.2005.htm
 * http://www.tramontana.co.hu/wix/

Install and use WiX as follows: [TBD]


Compiling natively on Windows using cygwin (untested):
*****************************************************

Cygwin:
 * Go to cygwin.org. Download and run Setup.
 * install these modules:
   g++ 4.0
   autoconf
   automake
   libexif-devel
   libopenssl098
   subversion
   openssh

