Monday 19 September 2016

What's new in StormCrawler 1.1

The 1.1 release comes 2 months after the previous one and is relatively lightweight by comparison. The main changes are :


Dependency upgrades

Jackson Databind (2.6.6) and Apache Storm (1.0.2)

Core

  • HTTP protocol : store response headers verbatim in metadata (#317) - used by the WARC module
  • FetcherBolt - added option to throttle based on number of URLs in queues (#311)
  • Conventional 'never-refetch' Date for nextFetchDate (#331)
  • Added metadata.lastProcessedDate
  • Deprecated StatusStreamBolt and copied as DummyIndexer

Elasticsearch

  • Bugfix Flush BulkProcessor before closing connection (#320)
  • Added per day / month metrics consumer (#327)

Archetype

  • Generate real jar name in README 
  • Proper handling of user-provided package names (#326)
  • POM for projects from archetypes should use Java 8 (#325)

There have been several minor changes as well. 

Remember that you can get regular updates about major commits on the project by following us on Twitter @stormcrawlerapi..

BTW there should be an exciting announcement in the next couple of weeks about a cool use of StormCrawler by a high-profile user, watch this space!

As usual, thanks to all contributors and users and happy crawling!

PS: If you are near the Bristol,  you might be interested in coming to the talk I'll be giving at Bristech on the Oct 6th.

NOTE

A patch release 1.1.1 has been published on the 21st Sept and includes #335 (thanks to Jeff Bolle for pointing it out).





No comments:

Post a Comment

Note: only a member of this blog may post a comment.