The Importance of Data Types in Configuration Files

A lesson from git

Written on Sat, 25 Jan 2025 08:17:05 +0530

by: Staff writer

I read an interesting blogpost regarding strange timeout behaviour of git autocorrect , where it only seems to wait for a period of 0.1 seconds after getting a mistyped command before going ahead with its auto-"corrected" version. 1

Types are hard, but don't have to be

Types are your friends.

Source of the strange setting

The blogger digs deeper and finds that this is the result of two innocuous changes that combine to deliver the impossible interval of 0.1 seconds:

  1. A patch that added help.autocorrect setting as a Boolean, 2
  2. followed by a later patch that changed the Boolean to an Integer representing the number of deciseconds to wait before going ahead with the auto-corrected version. 3

A deadly combination

Git documentation states 4 that a Boolean config value

may be given as yes/no, 0/1, true/false or on/off.

Thus help.autocorrect=1 would be interpreted the same as help.autocorrect=true . This is explicitly mentioned as acceptable in the git documentation. So far so good. When the config setting is changed to become Integer is when problems arise—now help.autocorrect=1 silently changes from representing true value to 1 (decisecond) value. This is completely non-evident to a person reading the git config, where help.autocorrect=1 merrily sits, implying that the autocorrect feature is "true" or "on." When run, this (correctly) waits for 1 decisecond, or 100 milliseconds, before executing an autocorrected guess of a mistyped command. A classic footgun.

Preventive measures are better than Corrective measures

The blogpost goes on to propose a patch that mitigates this problem to some extent. But we are more interested in how to prevent this kind of thing from developing in the first place. With the benefit of hindsight, we can identify some key points where a more robust design would have helped.

  1. Naming the configuration setting was ambiguous. If the intent was to make a Boolean configuration, it should've been named help.autocorrect_enabled or help.autocorrect_on instead of help.autocorrect . This makes it evident to the user that this is supposed to be a Boolean value, and also to the developer making future changes that this is not any Integer, even though it may have a value of "1".
  2. When the option to wait for a period of time before executing the action was added, it should not have been overloaded onto the pre-existing setting. Instead, a new setting, for example help.autocorrect_timeout or help.autocorrect_duration should have been introduced. Program logic should have checked whether help.autocorrect_enabled was true, and then applied the timeout mentioned in help.autocorrect_duration .

Both of which boil down to

Naming things is hard
Which the great Phil Karlton may or may not have said.

Getting out the big guns

Wouldn't it be nice if we could, for our own applications, mitigate this class of problems by design? It is indeed possible, using the same technology that we use in our programming languages—types. Settings must have clear, unambiguous types. Type validation must always be done immediately after parsing a config file. Validation failure must indicate erroneous configuration and the program should not continue.

In the above incident, treating "0/1" as a valid value for a Boolean setting is a design mistake, because it is necessarily ambiguous with Integer. In fact, git config documentation itself states: 4

when converting value to the canonical form using --bool type specifier; git config will ensure that the output is "true" or "false".
No value other than "true" or "false" should be accepted in a Boolean setting.

Which directly leads to the second design mitigation: the name of the setting must reflect the type of value it represents. Thus, help.autocorrect , if representing a Boolean, should be named help.autocorrect_enabled or similar, which makes it explicit that it is a Boolean. It is to be noted that this is most effective when used with type validation.

Structured configuration

In a complex enough application, configuration complexity is itself a concern and must be managed. In Java applications, the go-to configuration format is Java Properties 5

It is a simple, flat format, representing String->String pairs as key->value, not unlike the git config format seen above. For example:

Key1 = Value1
Key2 = Value2
Key3 : Value3

all represent "KeyN" strings being assigned "ValueN" values. The parser reads them as String values, and leaves any validation to be performed at the application level. Not a bad design for simple programs, as long as the lessons learned above about naming settings and validating types are followed.

The case of autocorrect would be handled thus:

# Configuration:
					help.autoCorrectEnabled=true
					help.autoCorrectTimeoutDuration=5
				

Validation (Java 23) could be as follows:


final var props = new Properties();
try (final var fread = new FileReader(new File("config.properties"))) {
	props.load(fread);
}
final String helpAutoCorrectEnabledStr = props.getProperty("help.autoCorrectEnabled",
	"false" // default value
);
final boolean helpAutoCorrectEnabled = Boolean.valueOf(helpAutoCorrectEnabledStr); // validation
if (helpAutoCorrectEnabled) {
	final String timeoutStr = props.getProperty("help.autoCorrectTimeoutDuration",
		"50" // default is 50 deciseconds
	);
	final int timeout = Integer.parseInt(timeoutStr);
	// Use timeout value...
}
				

Going further

If the configuration is more complex than can be handled by Java properties, we can switch to XML-based structured configuration. That has the advantage of being able to be type-checked using XML Schema data types, independently from the application code itself. But that is a whole different ball park, and will require a series of blog posts!

Do you have software problems like this that you need solved? Drop us an e-mail at [email protected] and see if we can help!


1. https://blog.gitbutler.com/why-is-git-autocorrect-too-fast-for-formula-one-drivers/

2. https://public-inbox.org/git/[email protected]/

3. https://public-inbox.org/git/[email protected]/?ref=blog.gitbutler.com

4. https://linux.die.net/man/1/git-config

5. https://docs.oracle.com/en/java/javase/23/docs/api/java.base/java/util/Properties.html