Class LicenseTextHelper


  • public class LicenseTextHelper
    extends Object
    Static helper class for comparing license text
    Author:
    Gary O'Neall
    • Field Detail

      • TOKEN_SPLIT_PATTERN

        public static final Pattern TOKEN_SPLIT_PATTERN
      • PUNCTUATION

        protected static final Set<String> PUNCTUATION
      • SKIPPABLE_TOKENS

        protected static final Set<String> SKIPPABLE_TOKENS
      • NORMALIZE_TOKENS

        public static final Map<String,​String> NORMALIZE_TOKENS
    • Method Detail

      • isLicenseTextEquivalent

        public static boolean isLicenseTextEquivalent​(String licenseTextA,
                                                      String licenseTextB)
        Returns true if two sets of license text is considered a match per the SPDX License matching guidelines documented at spdx.org (currently license matching guidelines) There are 2 unimplemented features - bullets/numbering is not considered and comments with no whitespace between text is not skipped
        Parameters:
        licenseTextA - text to compare
        licenseTextB - text to compare
        Returns:
        true if the license text is equivalent
      • tokenizeLicenseText

        public static String[] tokenizeLicenseText​(String licenseText,
                                                   Map<Integer,​LineColumn> tokenToLocation)
        Tokenizes the license text, normalizes quotes, lowercases and converts multi-words for better equiv. comparisons
        Parameters:
        tokenToLocation - location for all of the tokens
        licenseText - text to tokenize
        Returns:
        tokens array of tokens from the licenseText
      • getTokenAt

        public static String getTokenAt​(String[] tokens,
                                        int tokenIndex)
        Just fetches the string at the index checking for range. Returns null if index is out of range.
        Parameters:
        tokens - array of tokens
        tokenIndex - index of token to retrieve
        Returns:
        the token at the index or null if the token does not exist
      • canSkip

        public static boolean canSkip​(String token)
        Returns true if the token can be ignored per the rules
        Parameters:
        token - token to check
        Returns:
        true if the token can be ignored per the rules
      • tokensEquivalent

        public static boolean tokensEquivalent​(String tokenA,
                                               String tokenB)
        Returns true if the two tokens can be considered equivalent per the SPDX license matching rules
        Parameters:
        tokenA - token to compare
        tokenB - token to compare
        Returns:
        true if tokenA is equivalent to tokenB
      • replaceSpaceComma

        public static String replaceSpaceComma​(String s)
        Replace different forms of space with a normalized space and different forms of commas with a normalized comma
        Parameters:
        s - input string
        Returns:
        input string replacing all UTF-8 spaces with " " and all UTF-8 commas with ","
      • replaceMultWord

        public static String replaceMultWord​(String s)
        replaces all multi-words with a single token using a dash to separate
        Parameters:
        s - input string
        Returns:
        input string with all multi-words with a single token using a dash to separate
      • normalizeText

        public static String normalizeText​(String s)
        Normalize quotes and no-break spaces
        Parameters:
        s - String to normalize
        Returns:
        String normalized for comparison
      • removeLineSeparators

        public static String removeLineSeparators​(String s)
        Parameters:
        s - Input string
        Returns:
        s without any line separators (---, ***, ===)