AboutCode project

This page contains the details of a technical writing project accepted for Google Season of Docs.

Project summary

Open source organization:
AboutCode
Technical writer:
ayansinha
Project name:
Reference for Command Line Options in scancode-toolkit and Reorganize the structure of AboutCode documentation at aboutcode.readthedocs.io
Project length:
Standard length (3 months)

Project description

[ 1. Scancode-Toolkit Command Line Options ]

Scancode-Toolkit has a host of Command Line options to customize how the scan is performed, the output format and several other options like post-scan plugins. These options currently don’t have proper documentation to explain them and are only available through the “--help” or “-h” flag. This project aims to make complete documentation that explains:

[ 1. All the Options available through Command Line ]

  • Goal: An exhaustive list of all possible options through the command line.
  • Basic Overview: First, the default scan options are discussed, with an example of the output. A short graphic/description on how the scan is performed.
    Hereafter, this default behavior acts as a reference to how the other options change the scan and the output.
    These are to be discussed in detail and will contain the following information as mentioned in the next sections.

[ 2. Initiate Versioning Structure ]

  • Goal: Initiate a versioning system to properly maintain cross-release options/API and documentation changes.
  • Problem: Presently the documentation in the wiki and the ReadTheDocs pages are for older releases and needs major restructuring.
  • Basic Overview: The parts of the scancode-toolkit that have been updated/could be updated in version are
  • Command Line Options
  • APIs
  • Documentation (To be initiated) The command line options and the APIs are changed in versions and releases, and the documentation also has to follow, or it will create massive confusion for the users. The command line utility [ --help ] already is updated for any changes in options and could be used to replicate the versioning in the documentation.

[ 3. How these Options can be used in different cases ]

  • Goal: This section will provide a basic summary of how the scan results of scancode-toolkit can be used in different causes and the Scancode-Toolkit options that provide such functionality.
  • Basic Overview: This section gives different use case scenario examples and what options are recommended in those scenarios.
  • Note: This part requires significant help from the mentor in terms of inputs about and pointers to various use cases of Scancode-Toolkit.

[ 4. What these Options change in the Scan and the Output ]

  • Goal: This section will provide a basic summary of how the scan results of scancode-toolkit can be used in different causes, and the Aboutcode tools that provide such functionality.
  • Basic Overview: The options change the behavior of how the scan is performed. A basic default case will be illustrated in the leading section [ 1. All the Options available through Command Line ] and this section will compare the changes that all the options bring to this default scenario.

[ 5. Output Formats and their examples ]

  • Goal: This section will provide a basic summary of how the scan results of scancode-toolkit can be used in different causes, and the Aboutcode tools that provide such functionality.
  • Basic Overview: Scancode-Tool has flags to specify different output formats in which the scan results will be generated. These are -
    This part will
  • explain in detail the output formats
  • give examples on the output formats
  • give other links corresponding to the output format and its use
  • how scan results are stored in the output files. This also links to How these different formats are generated, which will be explained in [ 2. Discussions explaining Code Scanning ].

[ 6. Business Use of Scancode Output Formats ]

  • Goals: Explain the Business Use cases of Scancode Output formats In the GSoD ideas list, Scancode Output Formats is mentioned as a reference idea. This section implements the same.
  • Note: This part requires significant help from the mentor in terms of inputs about and pointers to various business use cases of Scancode-Toolkit.

[ 7. How these outputs are used by other AboutCode projects for more analysis ]

  • Goal: This section will provide a basic summary of how the scan results of scancode-toolkit can be used in different causes, and the Aboutcode tools that provide such functionality.
  • Basic Overview:
  • Scancode-Workbench This part explains visualizing results with the desktop app and pointers to scancode-workbench documentation for more support on the same. Will add required documentation to scancode-workbench if necessary.
  • Deltacode How scancode results are taken by Deltacode to determine file level differences between two codebases.

[ 2. Reorganize the structure of AboutCode Documentation ]

This part includes a host of changes to the Aboutcode Documentation

[ 1. Versioning system ]

In [ 1. Scancode-Toolkit Command Line Options -> 2. Initiate Versioning Structure] the issue of versioning the Command Line options are mentioned. The same is necessary for other parts of the documentation also which contain version specific commands/information that would otherwise create confusion.

[ 2. Setting Documentation Standards and Tests ]

The documentation already has tests for spinx-build (builds all the pages and checks for Sphinx syntax errors throughout) and link check (Checks all the links to other webpages from the documentation) with Continuous Integration through Travis-CI. (Added by me in this Pull Request #17 ) Now it needs more checks for specific linting in reStructured Text and other standards. This could be achieved with restructuredtext-lint but needs more research and will be done as a part of my GSoD project.

[ 3. Adding a “Getting Started” Section ]

This will act as a starting section for newcomers and will contain a compilation of the most basic and important documents to get started with Aboutcode Projects. Every Aboutcode Project will have this section including Scancode-Toolkit, Scancode-Workbench, Deltacode, and others.

[ 4. Restructuring According to the 4 Document Functions ]

The existing Documentation isn’t explicitly structured in the 4 document functions - Tutorials, How To’s, Reference and Explanations. I propose to structure those accordingly, adding more information/explanations/pointers whatever necessary. This holds for all the AboutCode projects and their documentation. Below are two examples of the Scancode-Toolkit documentation restructuring I propose and would like to carry on in this project. Similar changes will be carried out on the rest of the documentation.

[ 5. Restructuring the Development Page (Scancode-Toolkit) ]

More info on the Code/APIs could be added to make it more developer friendly. There can be links to the [ 2. Discussions explaining the Code Scanning ] section above. This links the explanation of how the scan works to the code it uses to perform the scan. Like these folders contain different parts of scancode-toolkit, their individual use can be elaborated with the APIs, in conjunction with the Discussion on how scancode works.

  • [ cluecode : plugins for scanning licenses, copyrights, urls, emails ]
  • [ commoncode : helper classes and functions]
  • [ extractcode : extracts different archive formats ]
  • [ formattedcode : output formatting for different output file formats ]
  • [ licensedcode : licence detection code ]
  • [ packagedcode : parsing various package formats ]
  • [ plugincode : classes for the plugins architecture ]
  • [ summarycode : summarizes scan on detected licenses ]
  • [ textcode : handles text parsing ]
  • [ typecode : handles file type determinations ]
  • [ scancode : CLI and API to scancode, the core part ]

This subsection will contain detailed information/APIs on these parts of scancode-toolkit in subsubsections accordingly. The Development guidelines will be there in another page or another section having smaller subsections.

[ 6. Restructuring the FAQ page (Scancode-Toolkit) ]

The FAQ page at present has questions which can be better answered and should be structured as separate How To’s, Tutorials and Reference documents separately.

  • How does ScanCode work? This issue is referenced in [ 2. Discussions explaining the Code Scanning ] and will be an entirely separate section in much more details.
  • How to Add New License Rules for Enhanced Detection? This issue is already discussed before in Improving the existing How-To’s, documentation will be moved there.
  • How to add a new license detection rule? This could be made into another “How To” post separately and could be elaborated on.
  • How To get started with Development? There’s already a separate development page and the information overlaps quite a lot. The restructuring of the development page has already been discussed above.
  • Steps to cut a new release This can be transformed into a separate “How To Cut a new release”.
  • Find more FAQ questions which answer generic questions about the project and doesn’t fall in the “How To”/”Tutorial” categories.