'Is there a way to do a syntactic/symantic diff of C code?
NEC2 was originally written in Fortran and there have been two different ports to C from the original Fortran (xnec2c and necpp).
The variable and function names are similar (usually exactly identical). However, the authors chose different data structures for global values.
Is there a way to diff these two implementations to see if there are any actual differences, not just differences in naming convention?
It seems that it could be possible, refactoring tools like Coccinelle have some structural awareness and understand datatypes.
If so, then bugs in one program or the other caused by author error can be detected by through such a static analysis. A human could then then be compare the C code implementations to the original Fortran to see which one is correct, or if the syntactically different representation was computationally equivalent. Note that this is static analysis, we just want to know if the code structure (branches and expressions and therefore datatypes) are the same.
For example, these two samples compute the same thing but the storage of variables icon1 and ind1 differ:
if( -icon1[iprx] != jx )
ind1=2;
else
{
xi= fabsl( cabj* cab[iprx]+ sabj* sab[iprx]+ salpj* salp[iprx]);
if( (xi < 0.999999) || (fabsl(bi[iprx]/b-1.) > 1.e-6) )
ind1=2;
else
ind1=0;
}
versus
if( -data.icon1[iprx] != jx )
dataj.ind1=2;
else
{
xi= fabs( dataj.cabj* data.cab[iprx]+ dataj.sabj*
data.sab[iprx]+ dataj.salpj* data.salp[iprx]);
if( (xi < 0.999999) ||
(fabs(data.bi[iprx]/dataj.b-1.0) > 1.0e-6) )
dataj.ind1=2;
else
dataj.ind1=0;
} /* if( -data.icon1[iprx] != jx ) */
Solution 1:[1]
This task seems difficult to automate. Even a seasoned programmer will have huge difficulties and get a splitting headache trying to find meaningful differences in semantics between these 2 bodies of code.
The obvious difference is the use of fabs() vs fabsl(), the long double version. But below the surface, one needs to keep track of the definitions of all the variables and functions used, most importantly the data types. The variables bear identical names, but might be defined with subtly different types, computing different results, often very close, but sometimes very different.
Writing a C program to perform a semantic comparison of 2 source files, ignoring spaces and comments is feasible, albeit non-trivial. Handling differences in local variable names, function names, structure member names and layout is much more difficult, but probably doable. Handling refactoring such as rewriting while loops as for loops, converting tests to ternary expressions, is more work but not impossible. But if types change, one would need a thorough analysis of value ranges and precision to ensure semantic identity, which seem quite difficult.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
