Interesting Things: A Software Development Blog
Table of Contents
Some Performance Observations
“The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry.” – Henry Petroski
All the results below were the best of 10 consecutive executions.
Getting the Time
You often need to time certain parts of your code. How accurate you need it to be depends on what you are timing, but for most purposes you don’t want the timing itself to contribute to any performance degradation.
Here is a little test to see how clock_gettime()
performs for each of
the different counters.
#include <stdio.h> #include <time.h> int main (int argc, char **argv) { static const struct { clockid_t id; const char *s; } clockids[] = { { CLOCK_REALTIME, "CLOCK_REALTIME" }, // 0 { CLOCK_REALTIME_COARSE, "CLOCK_REALTIME_COARSE" }, // 1 { CLOCK_MONOTONIC, "CLOCK_MONOTONIC" }, // 2 { CLOCK_MONOTONIC_COARSE, "CLOCK_MONOTONIC_COARSE" }, // 3 { CLOCK_MONOTONIC_RAW, "CLOCK_MONOTONIC_RAW" }, // 4 { CLOCK_BOOTTIME, "CLOCK_BOOTTIME" }, // 5 { CLOCK_PROCESS_CPUTIME_ID, "CLOCK_PROCESS_CPUTIME_ID" }, // 6 { CLOCK_THREAD_CPUTIME_ID, "CLOCK_THREAD_CPUTIME_ID" }, // 7 }; (void)argc; if (!argv[1]) { fprintf (stderr, "Need to specify a clock to use (0-7)\n"); return -1; } size_t index = (size_t)-1; if ((sscanf (argv[1], "%zu", &index) != 1) || index > 7) { fprintf (stderr, "Index %s is invalid\n", argv[1]); return -1; } printf ("Testing %s\n", clockids[index].s); for (size_t j=0; j<5000 * 100000; j++) { struct timespec ts; clock_gettime (clockids[index].id, &ts); } return 0; }
Here’s the results, timed with the time
command for each of the clock
counters available:
Counter | Duration |
---|---|
CLOCK_REALTIME |
10.805s |
CLOCK_REALTIME_COARSE |
4.180s |
CLOCK_MONOTONIC |
10.878s |
CLOCK_MONOTONIC_COARSE |
4.049s |
CLOCK_MONOTONIC_RAW |
10.724s |
CLOCK_BOOTTIME |
10.597s |
CLOCK_PROCESS_CPUTIME_ID |
227.224s |
CLOCK_THREAD_CPUTIME_ID |
225.883s |
Conclusion
- Read the manual carefully.
- If you can, use the
COARSE
counters. - Make sure you really, really need a
CPUTIME_ID
counter.
Unintended Allocations
Everyone knows that allocating memory is expensive; in a tight loop you want to use as much of the stack or preallocated memory as you can. What you want vs what you get are two different things. The first program below is idiomatic C++, the second is not. The second program takes about 40% of the time that the first program does.
int main (void) { std::string line; for (int player_num=0; player_num<1000; player_num++) { for (int points=0; points<100; points++) { for (int moves=0; moves<100; moves++) { line = "Player " + std::to_string(player_num) + " " + "scored " + std::to_string(points) + " points " + "in " + std::to_string(moves) + " moves." + "\n"; std::cout << line << "\n"; } } } return 0; }
int main (void) { char line[1024]; for (int player_num=0; player_num<1000; player_num++) { for (int points=0; points<100; points++) { for (int moves=0; moves<100; moves++) { snprintf (line, sizeof line, "Player %i scored %i points in %i moves\n", player_num, points, moves); std::cout << line << "\n"; } } } return 0; }
Here is the result of the best of ten runs:
Program | Duration |
---|---|
std::string |
real 0m8.348s |
snprintf |
real 0m3.263s |
Conclusion
Unless your strings are very very tiny, simply using std::string
in a
performance sensitive hot path is a really bad idea.
Using Regexes for Simple String Matching
Sure, it’s nice when an application lets the user specify a regex as a search term, but exactly how much slower will it be when the search term is an exact string and not a pattern?
In other words, most users don’t know that the search term they are supplying is being parsed as a regular expression.
Here’s a little program that compare the execution speed of an exact match search and a regex search.
#define SEARCH_TYPE_NONE -1 #define SEARCH_TYPE_EXACT 1 #define SEARCH_TYPE_REGEX 2 int main (int argc, char **argv) { (void)argc; if (!argv[1]) { fprintf (stderr, "Missing search type (exact or regex\n"); return -1; } int stype = SEARCH_TYPE_NONE; if ((strcmp (argv[1], "exact")) == 0) { stype = SEARCH_TYPE_EXACT; } if ((strcmp (argv[1], "regex")) == 0) { stype = SEARCH_TYPE_REGEX; } if (stype != SEARCH_TYPE_EXACT && stype != SEARCH_TYPE_REGEX) { fprintf (stderr, "Search type [%s] unrecognised\n", argv[1]); return -1; } if (!argv[2]) { fprintf (stderr, "Missing search term\n"); return -1; } if (!argv[3]) { fprintf (stderr, "Missing repeat count for matching\n"); return -1; } size_t count = 0; if ((sscanf (argv[3], "%zu", &count)) != 1) { fprintf (stderr, "Count value [%s] is invalid\n", argv[3]); return -1; } regex_t re; if ((regcomp (&re, argv[2], REG_EXTENDED | REG_NOSUB)) != 0) { fprintf (stderr, "Failed to compile regex\n"); return -1; } char input[1024]; while (count-- != 0 && !ferror (stdin) && !feof (stdin) && fgets (input, sizeof input, stdin)) { char *eol = strchr (input, '\n'); if (eol) *eol = 0; if (stype == SEARCH_TYPE_EXACT) { printf ("exact [%i]\n", strcmp (argv[2], input)); } else { printf ("re [%i]\n", regexec (&re, input, 0, NULL, 0)); } } regfree (&re); return 0; }
The tests were run on multiple inputs of abc
, abcdef
and abcdefghi
,
all generated by the yes
program and piped into the program above. The
search count was set to 10m.
The results are as follows:
Search term | Search type | Duration |
---|---|---|
abc | Regex | real 0m2.493s |
abcdef | Regex | real 0m2.620s |
abcdefghi | Regex | real 0m2.893s |
abc | Exact match | real 0m1.492s |
abcdef | Exact match | real 0m1.480s |
abcdefghi | Exact match | real 0m1.454s |
Conclusion
Unless your users:
- Know how to construct regular expression, and
- Have a burning need for very flexible search terms
You’re better off using the exact match function in your code1. The exact match searching executes in about 60% of the time that the regular expression does, with better performance on longer strings.2
Footnotes:
Your language of choice almost certainly includes a very highly optimised string matching function.
I would be surprised if strcmp()
does not switch to a vectorised
implementation pass a certain length. This would explain why the
performance is better when the string length goes over a particular point.