-
Two weeks ago together with Mateusz Fedoryszak I attended the first european Spark Summit (#SparkSummitEU). What did we find there and how did we enrich Spark Community? Let me tell you the story of the summit and Sparkling Ferns...
-
On 2015-06-11, I defended my PhD thesis entitled "Multiresolution classification using combination of density estimators" in the Systems Research Inistitute of the Polish Academy of Sciences.
-
ADA Lab's CERMINE participated in the Semantic Publishing challenge during the recent Extended Semantic Web Conference (ESWC 2015) in Portorož, Slovenia and we won the Best Performing Approach Award!
-
Having our roots in the Centre for Open Science (CeON) we're very keen on making sure anybody interested can take advantage of algorithms we design. Today we are making another step in that direction: we introduce ADA Lab Open Science APIs.
-
Recently in Athens there was an impressive kick-off of the OpenAIRE2020 project, during which we presented OpenAIRE’s plans in the area of text and data mining of scholarly publications. Publications contain all kinds of rich information, which, although understandable to a human reader, are not machine-readable and thus cannot be used directly for indexing and recommending purposes. Authors’ affiliations, document classifications, references to biological and chemical databases, acknowledgem...
-
Two weeks ago I participated in FORCE2015 in Oxford. It was a third conference organized by FORCE11 community and a must-attend event for people interested in scholarly communication, and in particular its problems and various ways of addressing them.
-
Recently, I was lucky enough to participate in the JURIX 2014 conference, taking place in Kraków, 10-12 December 2014. This was an event aimed at injecting the advancements of computer science into the legal domain. I must admit that the Organizers really achieved their goal. At least from my strongly computer-scientish perspective... During the conference, I presented a proof-of-concept study on how to detect and analyze topical trends in public procurement judgments. You can have a look ...
-
Did you know that the amount of milk given by a cow depends on the number of days since its last calving? A plot of this correlation is called a lactation curve. Read on to find out how do we use Apache Spark and D3 to find out how much milk we can expect on a particular day.
-
CERMINE is our Java library for extracting metadata from scientific literature. Among other information, CERMINE extracts the authors of the input document, their affiliations, and also associates authors with affiliations. Recently new functionality has beed added: affiliation parsing.
-
Recently, IEEE Spectrum interviewed Michael Jordan - a leading researcher in machine learning. He gave his view on hype in machine learning as well as in big data analysis and presented his point of view related to some other interesting issues (technological singularity, P=NP, Turing test).
-
This is a guest post by Selcuk Ayguney and Marcin Wojnarski, creators of Paperity. We invited the authors to share their reasons for choosing ADA Lab's (recently awarded) CERMINE as their content extraction engine. Here's their story.
-
My name is Jan Lasek and I was an intern at ICM ADA Lab team in the summer time. And I need to say that it was a great experience to work here!
-
A couple of days ago, members of our lab participated in PolTAL 2014, a conference bringing together linguists, computer scientists, and other researchers involved in computational linguistics and natural language processing.
-
Recently a few people from our lab visited London to participate in the Digital Libraries 2014 which was a conjunction of TPDL and JCDL – two best-known conferences on digital libraries.
-
You need 20 hours to be initially good at something and 10000 hours to be an expert in any domain. Be an expert easier and faster!
-
datadr is a package for the R programming language that provides a functionality of split-apply-combine for data transformation. See the Quickstart section in project's documentation for a nice overview of package's capabilities.
-
A team of ex-Googlers is building an open source version of Google Spanner, i.e., a transactional database that spans across many data centers.
-
Recently we've been working on building Spark apps with Maven.
-
Description of a workflow of a data scientists published on CACM blog.
-
In May's Nature, there is a column about an interesting text mining project called FUSE. The project is backed by US intelligence agency; its goal is to predict game-changing technologies based on mining of scientific publications and patent applications.
-
Last week, three of us (Mateusz, Marek, Paweł) attended a technical meeting of the OpenAIREplus project in Pisa.
-
CERMINE system was presented yesterday at this year's Document Analysis Systems conference. Our article entitled "CERMINE - automatic extraction of metadata and references from scientific literature" won ITESOFT Best Student Paper Award.
-
Yesterday at TOK FM (a popular Polish talk radio) I discussed with Cezary Łasiczka about the recent article in FT.com by Tim Harford titled "Big data: are we making a big mistake?".
-
We had a talk about Scala in ADA Lab at the Scalar 2014 conference.
-
Since monday we have started our one week in-house cooperation with Spotify at its Stockholm HQ.
-
Last week I spent in Liverpool visiting ScraperWiki. ScraperWiki provides tools for extracting, cleaning, analysing and managing data coming from various sources.
-
Quite interesting article on Gigaom.com which says that Cloudera is developing a system called Oryx. The system is aiming to be a better Mahout.
-
Every year for almost 20 years, in collaboration with Polish Childrens' Fund, ICM organizes weekly workshops for talented youth. This year's edition has just finished.
-
An information for R and RStudio enthusiasts about cool new features in the most recent version of RStudio (0.98) which I noticed today.
-
12-factor app is a manifest or a set of good engineering practices for modern web applications (but not only for them) created by people from Heroku, based on their huge experience.
-
Facebook just open sourced its Hadoop solution called Presto for doing SQL queries on Big Data.