AgriCatch-logoHello everyone!
Today is a great day to release some projects to the wild!

AgriCatch is a data aggregation tool I've built on top of Django.
What AgriCatch does is pretty simple - it lets you grab data from wild and disorganized websites (talking about the HTML of course).

It supports

  • XPath - use XPath to find different fields
  • Pagination - if you'd like to grab a list of things that are paginated, that's also possible
  • Custom functions - if you'd like to do something special with the fields before saving them to DB
  • Leftover event related functionality - originally AgriCatch was designed for events, because of that there's some leftover functionality in that matter:
    • days_on_page - you can give the timespan within the events of a single page are included (for example if it's a weekly page - 7)
    • start_day & num_of_days - you can give the importer a default starting day, it will then attempt to replace the url with a timestamp (according ot a format mentioned in the website template). Example:
      http://www.example_events.com?date=%m-%d-%Y
      Would then try to move forward days_on_page days until it reached the limit (num_of_days).
  • HTML & XML support

Most of the useful documentation for building importers is found in the Repository under agricatch/website.py
To import you simply run the command:

1
python manage.py doimport website_name --days=7

website_name refers to the name of the website in lowercase!

More info to come..