sheet2graph command-line program

sheet2graph command-line program

8 hours

We implement our own command in Python and distribute it via pip with pip install sheet2graph.

The command takes spreadsheets files (csv, xlsx) as input, and generates images (png, jpg, svg) based on the data contained.

Clone the repository

git clone https://github.com/fromzerotofullstack/sheet2graph

and get the episode branch

In every episode on the top-right corner of the screen you can see the command to switch to the branch.
Get the first episode code with
git checkout load_csv

Episodes

Ep.1 - Introduction, repo features and environment(17 minutes)

git checkout load_csv

We explain how the project will be structured and what we are trying to accomplish.


• Settings up the project
• Configuring a virtual environment
• Making a simple command that outputs a string


Ep.2 - Loading csv files(8 minutes)

git checkout load_csv

We create a Virtual environment and setup the project.

• Organizing commands with a Makefile
• Creating our Virtualenv
• Installing packages
• Using requirements.txt
• git branching


Ep.3 - First graphs(19 minutes)

git checkout first_graphs

We save out first graph from the spreadsheet data.

• Installing Plotly to generate graphs
• Pip freeze requirements
• Solving dependency problems
• Using Pandas to do simple data processing


Ep.4 - Saving in a folder(5 minutes)

git checkout save_in_folder

We will create a folder to save our image to.

• Saving the image in a folder programmatically
• Using Pathlib
• Add folder to .gitignore


Ep.5 - First commandline options(19 minutes)

git checkout first_options

We will add our first command line flags/options.

• Using argparse to parse commandline arguments and generate help
• Optional and required flags
• Add input file and graph type options
• Graph different graph types
• autogenerated help


Ep.6 - Output options(22 minutes)

git checkout output_options

Here we add an option for different output locations.

• Using argparse to parse commandline arguments and generate help
• Combine options to output with a filename and with a folder
• Precendence of command flags


Ep.7 - Output format(13 minutes)

git checkout output_format

We add Scalable Vector Graphics (SVG) output support.

• Add several output formats (svg, jpg, png) to our command


Ep.8 - Generated image size(6 minutes)

git checkout output_size

We will add output options to specify the size of the generated graphs.

• New option for output size
• Default and custom graph sizes


Ep.9 - Refactoring and type checking(22 minutes)

git checkout refactor_and_types

We start refactoring what we have until now, and add type annotations.

• Add basic annotations to functions
• Use Typeguard to enforce the annotations at runtime
• Annotations as a kind of documentation
• Add a new type of graph: the scatter graph


Ep.10 - Reading excel files(16 minutes)

git checkout read_excel

We will add support for Excel files (.xlsx).

• add dependencies: Openpyxl and Xlrd for Excel support
• Fix type errors


Ep.11 - Reading a file from Google Drive(15 minutes)

git checkout read_gdrive

In addition to a local file (.csv, .xlsx), we accept a Google Drive public document as input.

• Parsing the url address of the Google Drive document
• Adding the input option transparently for the user


Ep.12 - First tests(16 minutes)

git checkout first_tests

We will setup testing in our project to check the features we implemented.

• The unittest module in Python
• Add a test target to the Makefile
• Test loaders
• Assertions in testing
• Adding tests cache to .gitignore


Ep.13 - Testing helpers(16 minutes)

git checkout testing_helpers

Once we have the infrastructure for testing, we will write a few helpers to make the tests more concise and easier to write.

• Setup and teardown in a test suite
• Using Shutil to deal with filesystem operations
• Checking that a file is created in a specific path
• Checking the size of an image with PIL
• Running a command line application from a test


Ep.14 - Writing tests(32 minutes)

git checkout tests_implementation

After the testing infrastructure and helpers are ready, we are writing the tests for our commandline application.

• Testing input files
• Testing location of output files
• Testing generated image sizes
• Testing graph types


Ep.15 - Print version(24 minutes)

git checkout version_print_data

We will be making our command a bit friendlier, improving the default behaviour with informative messages for the user.

• New option to print help of command
• Print version of the command
• Default behaviour. No flags prints the version and exits
• Testing the output of our command with os.system and subprocess


Ep.16 - Print data(33 minutes)

git checkout print_data

After using hardcoded column names, we will start making the spreadsheet processing generic. This way our command will work with any spreadsheet file. The first step is to print the data of our input file, so the user can preview it.

• Print-only option to print the input file provided by the user
• Transforming the data with Pandas to index it by letter and 1-based integer (as in spreadsheet applications like Excel)
• Adding tests for the new indexing by letters and integer for columns and rows


Ep.17 - Testing data selection(1 hour 3 minutes)

git checkout select_data_tests

For each axis, we will allow the user to use expressions like 'b4,b5,b6,b7' or 'B4:B7' to select cells or ranges to graph.

• Add options '-x' and '-y' to select the data to be graphed
• Making the expressions case-insensitive
• Implementing a comma separated selection option
• Implementing a range selection option
• Adding tests first and making them pass after implementation, as in Test-Driven-Development (TDD)
• Better and more informative user messages in case of error
• Verifying and fixing problems in our Pandas implementation


Ep.18 - Adding labels to graphs(25 minutes)

git checkout select_data_implementation

We will check everything is working so far and add extra options to set the axis labels to a custom user-defined value.

• Debugging broken tests and making all tests pass after all our changes
• Adding an x label and y label options to our command, to specify the labels in the horizontal and vertical axis
• Debugging column types in pandas
• Using exceptions to deal with unreliable cases


Ep.19 - Distributing our command(57 minutes)

git checkout distribution

We will start preparing our command for distribution in pypi for it to be installable with pip, and testing the distribution in a test environment.

• Finding a proper name for our command
• Folder structure and necessary files
• Changes in the entry point of our program
• Generating a good README file in Markdown
• Examples and documentation

• Choosing a license
• Dependencies and versioning
• Setup.cfg and setup.py


Ep.20 - Packaging and uploading(20 minutes)

git checkout distribution

We have all the files. We will test the distribution at test.pypi.org before publishing it in the real repository. This way we will be able to fix any mistakes.

• How to test in the test.pypi.org environment
• Using Twine to distribute your module
• Creating a new account at testing (test.pypi.org)
• Secure tokens in .pypirc or interactively for testing
• Versioning increases with each code change
• Fixing errors and testing the new version in the test environment
• Problems with images as documentation in the README file


Ep.21 - Testing install(19 minutes)

git checkout distribution

After testing at test.pypi.org, we will fix an error with the example image in the documentation and distribute our command 'sheet2graph' to the real production environemnt at pypi.org.

• Fixing error with images in README.md at test.pypi.org
• Creating a new account at production (pypi.org)
• New secure tokens in .pypirc or interactively for production
• Solving problems with secure tokens


Ep.22 - Uploading to production(6 minutes)

git checkout distribution

In this episode we put everything together and install our own command by typing 'pip install sheet2graph'. We test it both in a virtual environment and globally. We also talk about what this means to distribute code you develop easily to the world, something that now should be a lot more approachable. As always, we need to be mindful of publishing useful and tested code, and in general to play well within the ecosystem.

If you made it here I hoped you liked it. Subscribe or Connect with me on Twitter for updates on fromzerotofullstack.

• Solving problems with secure tokens
• Installing our new command using pip
• Testing on a new virtual environment
• Etiquette of publishing your modules and libraries