Analyses the languages of all files in a given folder or folders and collates the results.
Powered by github-linguist, although it doesn't need to be installed.
Node.js must be installed to be able to use LinguistJS.
LinguistJS is available on npm as linguist-js.
Install locally using npm install linguist-js and import it into your code like so:
const linguist = require('linguist-js');Or install globally using npm install -g linguist-js and run using the CLI command linguist or linguist-js.
linguist --help
linguist-js --help
LinguistJS analyses a folder, or a dictionary of already-read file content, and determines what programming languages are used within.
As an example, take the following file structure:
/
| src
| | cli.js 1kB
| | index.ts 2kB
| readme.md 3kB
| no-lang 10B
| x.pluginspec 10B
(or, an object with keys "/src", "/src/cli.js", ... and preloaded file content)
Running LinguistJS on this will return the following JSON:
{
"files": {
"count": 5,
"bytes": 6020,
"lines": {
"total": 100,
"content": 90,
},
"results": {
"/src/index.ts": "TypeScript",
"/src/cli.js": "JavaScript",
"/readme.md": "Markdown",
"/no-lang": null,
"/x.pluginspec": "Ruby",
},
},
"languages": {
"count": 3,
"bytes": 6010,
"lines": {
"total": 90,
"content": 80,
},
"results": {
"JavaScript": { "bytes": 1000, "lines": { "total": 49, "content": 49 }, },
"Markdown": { "bytes": 3000, "lines": { "total": 10, "content": 5 }, },
"Ruby": { "bytes": 10, "lines": { "total": 1, "content": 1 }, },
"TypeScript": { "bytes": 2000, "lines": { "total": 30, "content": 25 }, },
},
},
"unknown": {
"count": 1,
"bytes": 10,
"lines": {
"total": 10,
"content": 10,
},
"filenames": {
"no-lang": 10,
},
"extensions": {},
},
"repository": {
"JavaScript": { "type": "programming", "color": "#f1e05a" },
"Markdown": { "type": "prose", "color": "#083fa1" },
"Ruby": { "type": "programming", "color": "#701516" },
"TypeScript": { "type": "programming", "color": "#2b7489" },
}
}- File paths in the output use only forward slashes as delimiters, even on Windows.
- Unless running in offline mode: do not rely on any language classification output from LinguistJS being unchanged between runs.
Language data is fetched each run from the latest classifications of
github-linguist. This data is subject to change at any time and may change the results of a run even when using the same version of Linguist.
import linguist from 'linguist-js';
// Analyse folder on disc
const folders = ['./src'];
const options = { keepVendored: false, quick: false };
const { files, languages, unknown, repository } = await linguist.analyseFolders(folder, options);
// Analyse file content from raw input
const fileContent = {
['file1.ts']: '#!/usr/bin/env node',
['file2.ts']: 'console.log("Example");',
['ignoreme.js']: 'ignored!',
}
const options = { ignoredFiles: ['ignoreme.*'] };
const { files, languages, unknown, repository } = await linguist.analyseRawContent(fileContent, options);Exports:
analyseFolders(folders?, opts?): Analyse the language of all files found in a folder or folders.folders(optional; string array): A list of folders to analyse (defaults to['./']).opts(optional; object): An object containing analyser options.
analyseRawContent(folders?, opts?): Analyse the language of all files found in a folder or folders.entry(optional; string or string array): A list of folders to analyse (defaults to['./']).opts(optional; object): An object containing analyser options.
Analyser options:
ignoredFiles(string array): A list of file path globs to explicitly ignore.ignoredLanguages(string array): A list of languages to ignore.categories(string array): A list of programming language categories that should be included in the results. Defaults to['data', 'markup', 'programming', 'prose'].childLanguages(boolean): Whether to display sub-languages instead of their parents when possible (defaults tofalse).quick(boolean): Whether to skip complex language analysis such as the checking of heuristics and gitattributes statements (defaults tofalse). Alias forcheckAttributes:false, checkIgnored:false, checkDetected:false, checkHeuristics:false, checkShebang:false, checkModeline:false.offline(boolean): Whether to use pre-packaged metadata files instead of fetching them from GitHub at runtime (defaults tofalse).calculateLines(boolean): Whether to calculate line of code totals (defaults totrue).keepVendored(boolean): Whether to keep vendored files (dependencies, etc) (defaults tofalse). Does nothing whenfileContentis set.keepBinary(boolean): Whether binary files should be included in the output (defaults tofalse).relativePaths(boolean): Change the absolute file paths in the output to be relative to the current working directory (defaults tofalse).checkAttributes(boolean): Force the checking of.gitattributesfiles (defaults totrueunlessquickis set). Does nothing whenfileContentis set.checkIgnored(boolean): Force the checking of.gitignorefiles (defaults totrueunlessquickis set). Does nothing whenfileContentis set.checkDetected(boolean): Force files marked withlinguist-detectableto show up in the output, even if the file is not part of the declaredcategories.checkHeuristics(boolean): Apply heuristics to ambiguous languages (defaults totrueunlessquickis set).checkShebang(boolean): Check shebang (#!) lines for explicit language classification (defaults totrueunlessquickis set).checkModeline(boolean): Check modelines for explicit language classification (defaults totrueunlessquickis set).
linguist --analyse [<folders...>] [<options...>]
linguist --help
linguist --version
--analyse: Analyse the language of all files found in a folder or folders.[<folders...>]: The folders to analyse (defaults to./).--ignoredFiles <globs...>: A list of file path globs to ignore.--ignoredLanguages <languages...>: A list of languages to exclude from the output.--categories <categories...>: A list of language categories that should be displayed in the output. Must be one or more ofdata,prose,programming,markup.--childLanguages: Display sub-languages instead of their parents, when possible.--json: Only affects the CLI output. Display the outputted language data as JSON.--tree <traversal>: Only affects the CLI output. A dot-delimited traversal to the nested object that should be logged to the console instead of the entire output. Requires--jsonto be specified.--listFiles: Only affects the visual CLI output. List each matching file and its size under each outputted language result. Does nothing if--jsonis specified.--quick: Skip the checking of.gitattributesand.gitignorefiles for manual language classifications. Alias for--checkAttributes=false --checkIgnored=false --checkHeuristics=false --checkShebang=false --checkModeline=false.--offline: Use pre-packaged metadata files instead of fetching them from GitHub at runtime.--calculateLines: Calculate line of code totals from files.--keepVendored: Include vendored files (auto-generated files, dependencies folder, etc) in the output.--keepBinary: Include binary files in the output.--relativePaths: Change the absolute file paths in the output to be relative to the current working directory.--checkAttributes: Force the checking of.gitatributesfiles. Use alongside--quickto override it disabling this option.--checkIgnored: Force the checking of.gitignorefiles. Use alongside--quickto override it disabling this option.--checkDetected: Force files marked withlinguist-detectableto show up in the output, even if the file is not part of the declared--categories. Use alongside--quickto override it disabling this option.--checkHeuristics: Apply heuristics to ambiguous languages. Use alongside--quickto override it disabling this option.--checkShebang: Check shebang (#!) lines for explicit classification. Use alongside--quickto override it disabling this option.--checkModeline: Check modelines for explicit classification. Use alongside--quickto override it disabling this option.
--help: Display the help message.--version: Display the current installed version of LinguistJS.