ADA Lab blog | ADA Lab

Sparkling-ferns for #ApacheSpark (Part 1: The Algorithm)

17 November 2015 by Piotr Dendek

Two weeks ago together with Mateusz Fedoryszak I attended the first european Spark Summit (#SparkSummitEU). What did we find there and how did we enrich Spark Community? Let me tell you the story of the summit and Sparkling Ferns...
PhD defense of ADA Laber Mateusz Kobos

18 June 2015 by Mateusz Kobos

On 2015-06-11, I defended my PhD thesis entitled "Multiresolution classification using combination of density estimators" in the Systems Research Inistitute of the Polish Academy of Sciences.
CERMINE wins award at ESWC 2015

17 June 2015 by Łukasz Bolikowski

ADA Lab's CERMINE participated in the Semantic Publishing challenge during the recent Extended Semantic Web Conference (ESWC 2015) in Portorož, Slovenia and we won the Best Performing Approach Award!
Introducing ADA Lab Open Science APIs

11 May 2015 by Mateusz Fedoryszak

Having our roots in the Centre for Open Science (CeON) we're very keen on making sure anybody interested can take advantage of algorithms we design. Today we are making another step in that direction: we introduce ADA Lab Open Science APIs.
Text Mining Services in OpenAIRE

16 February 2015 by Łukasz Bolikowski and Mateusz Kobos

Recently in Athens there was an impressive kick-off of the OpenAIRE2020 project, during which we presented OpenAIRE’s plans in the area of text and data mining of scholarly publications. Publications contain all kinds of rich information, which, although understandable to a human reader, are not machine-readable and thus cannot be used directly for indexing and recommending purposes. Authors’ affiliations, document classifications, references to biological and chemical databases, acknowledgem...
Let's join FORCEs and make a difference in scholarly communication

02 February 2015 by Dominika Tkaczyk

Two weeks ago I participated in FORCE2015 in Oxford. It was a third conference organized by FORCE11 community and a must-attend event for people interested in scholarly communication, and in particular its problems and various ways of addressing them.
Kraków – where AI meets the law

22 December 2014 by Michał Łopuszyński

Recently, I was lucky enough to participate in the JURIX 2014 conference, taking place in Kraków, 10-12 December 2014. This was an event aimed at injecting the advancements of computer science into the legal domain. I must admit that the Organizers really achieved their goal. At least from my strongly computer-scientish perspective... During the conference, I presented a proof-of-concept study on how to detect and analyze topical trends in public procurement judgments. You can have a look ...
Spark, D3, data visualization and Super Cow Powers

26 November 2014 by Mateusz Fedoryszak

Did you know that the amount of milk given by a cow depends on the number of days since its last calving? A plot of this correlation is called a lactation curve. Read on to find out how do we use Apache Spark and D3 to find out how much milk we can expect on a particular day.
Affiliation parsing in CERMINE

13 November 2014 by Dominika Tkaczyk

CERMINE is our Java library for extracting metadata from scientific literature. Among other information, CERMINE extracts the authors of the input document, their affiliations, and also associates authors with affiliations. Recently new functionality has beed added: affiliation parsing.
Interview with Michael Jordan about machine learning, big data, and other things

27 October 2014 by Mateusz Kobos

Recently, IEEE Spectrum interviewed Michael Jordan - a leading researcher in machine learning. He gave his view on hype in machine learning as well as in big data analysis and presented his point of view related to some other interesting issues (technological singularity, P=NP, Turing test).
Paperity chooses CERMINE as its content extraction engine

23 October 2014 by Selcuk Ayguney & Marcin Wojnarski

This is a guest post by Selcuk Ayguney and Marcin Wojnarski, creators of Paperity. We invited the authors to share their reasons for choosing ADA Lab's (recently awarded) CERMINE as their content extraction engine. Here's their story.
Summer internship at ADA Lab

06 October 2014 by Jan Lasek

My name is Jan Lasek and I was an intern at ICM ADA Lab team in the summer time. And I need to say that it was a great experience to work here!
Impressions from PolTAL 2014

30 September 2014 by Michał Łopuszyński

A couple of days ago, members of our lab participated in PolTAL 2014, a conference bringing together linguists, computer scientists, and other researchers involved in computational linguistics and natural language processing.
Mind the gap! – DL2014

23 September 2014 by Łukasz Bolikowski, Mateusz Fedoryszak & Dominika Tkaczyk

Recently a few people from our lab visited London to participate in the Digital Libraries 2014 which was a conjunction of TPDL and JCDL – two best-known conferences on digital libraries.
Want to remember Spark API or learn Scala? Use our courses on memrise.com

15 September 2014 by Piotr Dendek

You need 20 hours to be initially good at something and 10000 hours to be an expert in any domain. Be an expert easier and faster!
datadr: split-apply-combine package for R backed by Hadoop

04 September 2014 by Mateusz Kobos

datadr is a package for the R programming language that provides a functionality of split-apply-combine for data transformation. See the Quickstart section in project's documentation for a nice overview of package's capabilities.
CockroachDB: an open source version of Google Spanner

25 July 2014 by Mateusz Kobos

A team of ex-Googlers is building an open source version of Google Spanner, i.e., a transactional database that spans across many data centers.
Building Apache Spark App with Maven

15 July 2014 by Artur Czeczko & Mateusz Fedoryszak

Recently we've been working on building Spark apps with Maven.
Data science workflow

13 June 2014 by Mateusz Kobos

Description of a workflow of a data scientists published on CACM blog.
FUSE: project for mining game-changing technologies from scientific publications and patents

12 June 2014 by Mateusz Kobos

In May's Nature, there is a column about an interesting text mining project called FUSE. The project is backed by US intelligence agency; its goal is to predict game-changing technologies based on mining of scientific publications and patent applications.
At an OpenAIREplus technical meeting in Pisa

06 June 2014 by Mateusz Kobos

Last week, three of us (Mateusz, Marek, Paweł) attended a technical meeting of the OpenAIREplus project in Pisa.
CERMINE wins Best Student Paper Award at DAS conference

11 April 2014 by Dominika Tkaczyk

CERMINE system was presented yesterday at this year's Document Analysis Systems conference. Our article entitled "CERMINE - automatic extraction of metadata and references from scientific literature" won ITESOFT Best Student Paper Award.
On big trouble with big data at TOK FM

10 April 2014 by Łukasz Bolikowski

Yesterday at TOK FM (a popular Polish talk radio) I discussed with Cezary Łasiczka about the recent article in FT.com by Tim Harford titled "Big data: are we making a big mistake?".
Scoobi, Scalding, Spark, Stratosphere – ICM at Scalar 2014

05 April 2014 by Mateusz Fedoryszak & Michał Oniszczuk

We had a talk about Scala in ADA Lab at the Scalar 2014 conference.
Perfect Data Analysis for Every Moment – ICM at Spotify 2014

26 March 2014 by Piotr Jan Dendek Mateusz Fedoryszak & Michał Oniszczuk

Since monday we have started our one week in-house cooperation with Spotify at its Stockholm HQ.
Visit at ScraperWiki

26 March 2014 by Dominika Tkaczyk

Last week I spent in Liverpool visiting ScraperWiki. ScraperWiki provides tools for extracting, cleaning, analysing and managing data coming from various sources.
Article: Cloudera Oryx as the next Mahout

13 March 2014 by Mateusz Kobos

Quite interesting article on Gigaom.com which says that Cloudera is developing a system called Oryx. The system is aiming to be a better Mahout.
Mathematical modelling workshop for talented youth

31 January 2014 by Łukasz Bolikowski

Every year for almost 20 years, in collaboration with Polish Childrens' Fund, ICM organizes weekly workshops for talented youth. This year's edition has just finished.
Debugging and manipulate function in RStudio

30 December 2013 by Mateusz Kobos

An information for R and RStudio enthusiasts about cool new features in the most recent version of RStudio (0.98) which I noticed today.
12-factor app

17 December 2013 by Mateusz Kobos

12-factor app is a manifest or a set of good engineering practices for modern web applications (but not only for them) created by people from Heroku, based on their huge experience.
Facebook Presto

08 November 2013 by Mateusz Kobos

Facebook just open sourced its Hadoop solution called Presto for doing SQL queries on Big Data.