Monday 29 July 2013

Nutch training course

We are planning to run a 2-day training courses on Apache Nutch on the 24/25 October 2013. It will take place in Bristol, UK (the exact venue will be announced later). 

The course has been put on hold for now. Please do get in touch if you are interested and I will keep you updated as soon as we reach a sufficient number of attendees.

The course will cover pretty much everything about Nutch from installation and configuration to writing custom resources and will cover both Nutch 1.x and 2.x. The students will learn about best practices for running and managing a Nutch crawl. 

Attendees should have some knowledge of JAVA and be comfortable with command line tools to execute basic commands. Some understanding of Hadoop is a plus but not a strict requirement. The course will consist in some hands-on exercises : bring your laptop! Note that the demonstrations and exercises will be based on a Linux OS.

The program given here is an indication only and might change slightly. Feel free to suggest things that you'd like to learn during the course. 

Day 1 : NUTCH BASICS

  • Basic setup
  • Compilation and dependencies
  • Main concepts and operational steps
  • Nutch data structures
  • Parsing
  • Indexing
  • Scoring
  • Best practices for development and in production 

Day 2 : ADVANCED NUTCH

  • Plugin architecture
  • Politeness and performance
  • Metadata in Nutch
  • Advanced use cases
  • Introduction to Nutch 2.x

Please contact us on course@digitalpebble.com if you have a question or want to be kept informed of the next date for this course.

7 comments:

  1. Is it free ? Would it be able as a webinar?

    ReplyDelete
  2. The price will be announced soon but it definitely won't be free. As for a webinar the answer is no (can you imagine a 2 day long webinar?) . Attendees will be given a copy of all the material (slides, code, etc...) at the end of the course

    ReplyDelete
  3. Hi Julien,
    Great to see you pushing this on.
    It may be worth also getting in touch with some of the IR teams @ University of Glasgow and Strathclyde? They are IR daft there and I'm sure Web Search is very much thier cup of tea.
    Thanks for heads up on user@nutch
    Best
    Lewis

    ReplyDelete
  4. It's great to hold such an event. It would be greater if you could provide this to the wider community of users outside the UK. Say, to tape the class and sell it online. Unfortunately, there aren't much resources on Nutch on the Web, and the documentation isn't that helpful for non-experts.

    ReplyDelete
  5. Hi Arian, I've been thinking about writing a book on Nutch but so far I am not convinced that it would be financially worth the time as the audience for it is quite small. What I might do however is to reuse the material of the course towards a book later or maybe a video indeed. We'll see if there is enough of an audience for the course first.

    ReplyDelete
  6. Is there any more updates on the course?

    ReplyDelete
  7. Please contact us on course@digitalpebble.com if you have a question or want to be kept informed of the next date for this course.

    ReplyDelete

Note: only a member of this blog may post a comment.