Introduction

Idea

BaseballCZ-Statistics aims to provide an easy to use Python API that could be used to automatize pipelines for statistics computation on the data from baseball.cz.

The API allows to either directly use the data downloaded on the remote server, or to automatically download the current CSV files locally, and work with those further.

Used Technologies

The API is build on several other Python libraries that are used to speed up the development process.

Data Download and Remote Server

As of now baseball.cz doesn’t provide an easy access for automatic data download. Selenium is therefore used to simulate clicking the download button on the statistics page. The CSVs are downloaded locally and then sent to a remote FTP server.

To retrieve the data, API communicates via requests module with remote Flask server that sends back the loaded data.

Statistics

Data received from a remote server are parsed into Pandas Dataframe. Along with the Numpy the API provides vectorized computation of the requested statistics.

The API computes most statistics described at baseball-stat.cz, and can access all statistics provided in CSV files.