Go4Expert

Go4Expert (http://www.go4expert.com/)
-   C++ (http://www.go4expert.com/articles/cpp-tutorials/)
-   -   PVS-Studio VS Chromium (http://www.go4expert.com/articles/pvs-studio-vs-chromium-t25895/)

Karpov2007 25May2011 12:39

PVS-Studio VS Chromium
 
Good has won this time. To be more exact, source codes of the Chromium project have won. Chromium is one of the best projects we have checked with PVS-Studio.

Chromium is an open-source web-browser developed by Google and intended to provide users with fast and safe Internet access. Chromium serves as the base for the Google Chrome browser. Moreover, Chromium is a preliminary version of Google Chrome as well as some other alternative web-browsers.

From the programming viewpoint, Chromium is a solution consisting of 473 projects. The general size of the source C/C++ code is about 460 Mbytes and the number of lines is difficult to count.

These 460 Mbytes include a lot of various libraries. If you exclude them, you will have about 155 Mbytes. It is much less but still a lot of lines. Moreover, everything is relative, you know. Many of these libraries were created by the Chromium developers within the task of creating Chromium itself. Although such libraries live by themselves, still we may refer them to the browser.

Chromium had become the most quality and large project I have studied during testing of PVS-Studio. While handling the Chromium project it was not actually clear to us what was checking what: we have found and fixed several errors in PVS-Studio related to C++ file analysis and support of a specific project's structure.

Many aspects and methods used in Chromium show the quality of its source code. For instance, most programmers determine the number of items in an array using the following construct:

Code:

int XX[] = { 1, 2, 3, 4 };
size_t N = sizeof(XX) / sizeof(XX[0]);

Usually it is arranged as a macro of this kind:
Code:

#define count_of(arg) (sizeof(arg) / sizeof(arg[0]))
This is a quite efficient and useful macro. To be honest, I have always used this very macro myself. However, it might lead to an error because you may accidentally pass a simple pointer to it and it will not mind. Let me explain this by the following example:

Code:

void Test(int C[3])
{
  int A[3];
  int *B = Foo();
  size_t x = count_of(A); // Ok
  x = count_of(B); // Error
  x = count_of(C); // Error
}

The count_of(A) construct works correctly and returns the number of items in the A array which is equal to three here.

But if you apply by accident count_of() to a pointer, the result will be a meaningless value. The issue is that the macro will not produce any warning for the programmer about a strange construct of the count_of(B) sort. This situation seems farfetched and artificial but I had encountered it in various applications. For example, consider this code from the Miranda IM project:
Code:

#define SIZEOF(X) (sizeof(X)/sizeof(X[0]))
int Cache_GetLineText(..., LPTSTR text, int text_size, ...)
{
  ...
  tmi.printDateTime(pdnce->hTimeZone, _T("t"), text, SIZEOF(text), 0);
  ...
}

So, such errors may well exist in your code and you'd better have something to protect yourself against them. It is even easier to make a mistake when trying to calculate the size of an array passed as an argument:
Code:

void Test(int C[3])
{
  x = count_of(C); // Error
}
According to the C++ standard, the 'C' variable is a simple pointer, not an array. As a result, you may often see in programs that only a part of the array passed is processed.

Since we have started speaking of such errors, let me tell you about a method that will help you find the size of the array passed. You should pass it by the reference:
Code:

void Test(int (&C)[3])
{
  x = count_of(C); // Ok
}

Now the result of the count_of(C) expression is value 3.

Let's return to Chromium. It uses a macro that allows you to avoid the above described errors. This is how it is implemented:

Code:

template <typename T, size_t N>
char (&ArraySizeHelper(T (&array)[N]))[N];
#define arraysize(array) (sizeof(ArraySizeHelper(array)))

The idea of this magic spell is the following: the template function ArraySizeHelper receives an array of a random type with the N length. The function returns the reference to the array of the N length consisting of 'char' items. There is no implementation for this function because we do not need it. For the sizeof() operator it is quite enough just to define the ArraySizeHelper function. The 'arraysize' macro calculates the size of the array of bytes returned by the ArraySizeHelper function. This size is the number of items in the array whose length we want to calculate.

If you have gone crazy because of all this, just take my word for it - it works. And it works much better than the 'count_of()' macro we have discussed above. Since the ArraySizeHelper function takes an array by the reference, you cannot pass a simple pointer to it. Let's write a test code:

Code:

template <typename T, size_t N>
char (&ArraySizeHelper(T (&array)[N]))[N];
#define arraysize(array) (sizeof(ArraySizeHelper(array)))

void Test(int C[3])
{
  int A[3];
  int *B = Foo();
  size_t x = arraysize(A); // Ok
  x = arraysize(B); // Compilation error
  x = arraysize(C); // Compilation error
}

The incorrect code simply will not be compiled. I think it's cool when you can prevent a potential error already at the compilation stage. This is a nice sample reflecting the quality of this programming approach. My respect goes to Google developers.

Let me give you one more sample which is of a different sort yet it shows the quality of the code as well.
Code:

if (!file_util::Delete(db_name, false) &&
    !file_util::Delete(db_name, false)) {
  // Try to delete twice. If we can't, fail.
  LOG(ERROR) << "unable to delete old TopSites file";
  return false;
}

Many programmers might find this code strange. What is the sense in trying to remove a file twice? There is a sense. The one who wrote it has reached Enlightenment and comprehended the essence of software existence. A file can be definitely removed or cannot be removed at all only in textbooks and in some abstract world. In the real system it often happens that a file cannot be removed right now and can be removed an instance later. There may be many reasons for that: antivirus software, viruses, version control systems and whatever. Programmers often do not think of such cases. They believe that when you cannot remove a file you cannot remove it at all. But if you want to make everything well and avoid littering in directories, you should take these extraneous factors into account. I encountered quite the same situation when a file would not get removed once in 1000 runs. The solution was also the same - I only placed Sleep(0) in the middle just in case.

Well, and what about the check by PVS-Studio? Chromium's code is perhaps the most quality code I've ever seen. This is confirmed by the low density of errors we've managed to find. If you take their quantity in general, there are certainly plenty of them. But if you divide the number of errors by the amount of code, it turns out that there are almost no errors. What are these errors? They are the most ordinary ones. Here are several samples:

Code:

V512 A call of the 'memset' function will lead to underflow of the buffer '(exploded)'. platform time_win.cc 116
Code:

void NaCl::Time::Explode(bool is_local, Exploded* exploded) const {
  ...
  ZeroMemory(exploded, sizeof(exploded));
  ...
}

Everybody makes misprints. In this case, an asterisk is missing. It must be sizeof(*exploded).
Code:

V502  Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '-' operator.  views  custom_frame_view.cc  400
Code:

static const int kClientEdgeThickness;
int height() const;
bool ShouldShowClientEdge() const;

void CustomFrameView::PaintMaximizedFrameBorder(gfx::Canvas* canvas) {
  ...
  int edge_height = titlebar_bottom->height() -
                    ShouldShowClientEdge() ? kClientEdgeThickness : 0;
  ...
}

The insidious operator "?:" has a lower priority than subtraction. There must be additional parentheses here:
Code:

int edge_height = titlebar_bottom->height() -
                  (ShouldShowClientEdge() ? kClientEdgeThickness : 0);

A meaningless check.
Code:

V547  Expression 'count < 0' is always false. Unsigned type value is never < 0.  ncdecode_tablegen  ncdecode_tablegen.c  197
Code:

static void CharAdvance(char** buffer, size_t* buffer_size, size_t count) {
  if (count < 0) {
    NaClFatal("Unable to advance buffer by count!");
  } else {
  ...
}

The "count < 0" condition is always false. The protection does not work and some buffer might get overflowed. By the way, this is an example of how static analyzers might be used to search for vulnerabilities. An intruder can quickly find code fragments that contain errors for further thorough investigation. Here is another code sample related to the safety issue:
Code:

V511  The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (salt)' expression.  common  visitedlink_common.cc  84
Code:

void MD5Update(MD5Context* context, const void* buf, size_t len);

VisitedLinkCommon::Fingerprint VisitedLinkCommon::ComputeURLFingerprint(
  ...
 const uint8 salt[LINK_SALT_LENGTH])
{
  ...
  MD5Update(&ctx, salt, sizeof(salt));
  ...
}

The MD5Update() function will process as many bytes as the pointer occupies. This is a potential loophole in the data encryption system, isn't it? I do not know whether it implies any danger; however, from the viewpoint of intruders, this is a fragment for thorough analysis.

The correct code should look this way:
Code:

MD5Update(&ctx, salt, sizeof(salt[0]) * LINK_SALT_LENGTH);
Or this way:
Code:

VisitedLinkCommon::Fingerprint VisitedLinkCommon::ComputeURLFingerprint(
  ...
 const uint8 (&salt)[LINK_SALT_LENGTH])
{
  ...
  MD5Update(&ctx, salt, sizeof(salt));
  ...
}

One more sample with a misprint:
Code:

V501  There are identical sub-expressions 'host != buzz::XmlConstants::str_empty ()' to the left and to the right of the '&&' operator.  chromoting_jingle_glue  iq_request.cc  248
Code:

void JingleInfoRequest::OnResponse(const buzz::XmlElement* stanza) {
  ...
  std::string host = server->Attr(buzz::QN_JINGLE_INFO_HOST);
  std::string port_str = server->Attr(buzz::QN_JINGLE_INFO_UDP);
  if (host != buzz::STR_EMPTY && host != buzz::STR_EMPTY) {
  ...
}

The port_str variable must be actually checked as well:
Code:

if (host != buzz::STR_EMPTY && port_str != buzz::STR_EMPTY) {
A bit of classics:
Code:

V530  The return value of function 'empty' is required to be utilized.  chrome_frame_npapi  np_proxy_service.cc  293
Code:

bool NpProxyService::GetProxyValueJSONString(std::string* output) {
  DCHECK(output);
  output->empty();
  ...
}

It must be: output->clear();

And here is even the handling of a null pointer:
Code:

V522  Dereferencing of the null pointer 'plugin_instance' might take place. Check the logical condition.  chrome_frame_npapi  chrome_frame_npapi.cc  517
Code:

bool ChromeFrameNPAPI::Invoke(...)
{
  ChromeFrameNPAPI* plugin_instance =
    ChromeFrameInstanceFromNPObject(header);
  if (!plugin_instance && (plugin_instance->automation_client_.get()))
    return false;
  ... 
}

One more example of a check that will never work:
Code:

V547  Expression 'current_idle_time < 0' is always false. Unsigned type value is never < 0.  browser  idle_win.cc  23
Code:

IdleState CalculateIdleState(unsigned int idle_threshold) {
  ...
  DWORD current_idle_time = 0;
  ...
  // Will go -ve if we have been idle for a long time (2gb seconds).
  if (current_idle_time < 0)
    current_idle_time = INT_MAX;
  ...
}

Well, we should stop here. I can continue but it's starting to get boring. Remember that all this only concerns the Chromium itself. But there are also tests with errors like this:
Code:

V554  Incorrect use of auto_ptr. The memory allocated with 'new []' will be cleaned using 'delete'.  interactive_ui_tests  accessibility_win_browsertest.cc  306
Code:

void AccessibleChecker::CheckAccessibleChildren(IAccessible* parent) {
  ...
  auto_ptr<VARIANT> child_array(new VARIANT[child_count]);
  ...
}

There are also plenty of libraries Chromium is actually based on, the total size of libraries being much larger than that of Chromium itself. They also have a lot of interesting fragments. It is clear that code containing errors might not be used anywhere, still they are the errors nonetheless. Consider one of the examples (the ICU library):
Code:

V547 Expression '* string != 0 || * string != '_'' is always true. Probably the '&&' operator should be used here.  icui18n ucol_sit.cpp 242
Code:

U_CDECL_BEGIN static const char* U_CALLCONV
_processVariableTop(...)
{
  ...
  if(i == locElementCapacity && (*string != 0 || *string != '_')) {
    *status = U_BUFFER_OVERFLOW_ERROR;
  }
  ...
}

The "(*string != 0 || *string != '_')" expression is always true. Perhaps it must be: (*string == 0 || *string == '_').

Conclusion



PVS-Studio was defeated. Chromium's source code is one of the best we have ever analyzed. We have found almost nothing in Chromium. To be more exact, we have found a lot of errors and this article demonstrates only a few of them. But if we keep in mind that all these errors are spread throughout the source code with the size of 460 Mbytes, it turns out that there are almost no errors at all.

P.S.

I'm answering to the question: will we inform the Chromium developers of the errors we've found? No, we won't. It is a very large amount of work and we cannot afford doing it for free. Checking Chromium is far from checking Miranda IM or checking Ultimate Toolbox. This is a hard work, we have to study all of the messages and make a decision whether there is an error in every particular case. To do that, we must be knowledgeable about the project. We will this article to the Chromium developers, and should they find it interesting, they will be able to analyze the project themselves and study all the diagnostic messages. Yes, they will have to purchase PVS-Studio for this purpose. But any Google department can easily afford this.

ManzZup 25May2011 21:06

Re: PVS-Studio VS Chromium
 
simply an awesome run though of the thing
got ma fullest attension
+ i once saw a little of the Chromium, but never thought of analyzing because the sight drew me all crazy :)
good work :D

Karpov2007 26May2011 11:41

Re: PVS-Studio VS Chromium
 
Just in case, I want to remind you that the article describes only some of the suspicious places in code. The analysis of the full PVS-Studio report is a huge task and it's better to be performed by the developers themselves. I will now cite some additional suspicious fragments from Chromium and it's libraries:

Code:

V564  The '&' operator is applied to bool type value. You've probably forgotten to include parentheses or intended to use the '&&' operator.  base  platform_file_win.cc  216

#define FILE_ATTRIBUTE_DIRECTORY 0x00000010

bool GetPlatformFileInfo(PlatformFile file, PlatformFileInfo* info) {
  ...
  info->is_directory =
    file_info.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY != 0;
  ...
}
-----------------------------------------
V503  This is a nonsensical comparison: pointer < 0.  browser  profile_impl.cc  169

void GetCacheParameters(ContextType type, FilePath* cache_path,
                        int* max_size) {
  ...
  *max_size = 0;
  if (!base::StringToInt(value, max_size)) {
    *max_size = 0;
  } else if (max_size < 0) {
    *max_size = 0;
  }
  ...
}
-----------------------------------------
V511  The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (salt)' expression.  browser  visitedlink_master.cc  968
V512  A call of the 'memcpy' function will lead to underflow of the buffer 'salt_'.  browser  visitedlink_master.cc  968

uint8 salt_[LINK_SALT_LENGTH];

VisitedLinkMaster::TableBuilder::TableBuilder(
    VisitedLinkMaster* master,
    const uint8 salt[LINK_SALT_LENGTH])
    : master_(master),
      success_(true) {
  fingerprints_.reserve(4096);
  memcpy(salt_, salt, sizeof(salt));
}
-----------------------------------------
V530  The return value of function 'empty' is required to be utilized.  chrome_frame_ie  protocol_sink_wrap.cc  399

std::wstring url_;

HRESULT ProtData::ReportProgress(IInternetProtocolSink* delegate,
                                ULONG status_code, LPCWSTR status_text) {
  ...
  case BINDSTATUS_REDIRECTING:
    url_.empty();
    if (status_text)
      url_ = status_text;
    break;
  ...
}
-----------------------------------------
V554  Incorrect use of auto_ptr. The memory allocated with 'new []' will be cleaned using 'delete'.  interactive_ui_tests  accessibility_win_browsertest.cc  171

void AccessibleContainsAccessible(...)
{
  ...
  auto_ptr<VARIANT> child_array(new VARIANT[child_count]);
  ...
}
-----------------------------------------
V540  Member 'lpstrFilter' should point to string terminated by two 0 characters.  test_shell_common  test_shell_win.cc  643

bool TestShell::PromptForSaveFile(const wchar_t* prompt_title,
                                  FilePath* result) {
  ...
  OPENFILENAME info = {0};
  ...
  info.lpstrFilter = L"*.txt";
  ...
}
-----------------------------------------
V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 800, 808. icui18n msgfmt.cpp 800

UnicodeString&
MessageFormat::toPattern(UnicodeString& appendTo) const {
  ...
  else if (formatAlias == *defaultTimeTemplate) {
    appendTo += ID_TIME;
  }
  ...
  else if (formatAlias == *defaultTimeTemplate) {
    appendTo += ID_TIME;
    appendTo += COMMA;
    appendTo += ID_MEDIUM;
  }
  ...
}

And here:
V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 777, 785. icui18n msgfmt.cpp 777
-----------------------------------------
V501 There are identical sub-expressions to the left and to the right of the '&&' operator: !has_audio &&!has_audio libjingle_p2p sessionmessages.cc 308

bool ParseGingleTransportInfos(...)
{
  ...
  bool has_audio = FindContentInfoByName(contents, CN_AUDIO) != NULL;
  bool has_video = FindContentInfoByName(contents, CN_VIDEO) != NULL;

  if (!has_audio && !has_audio) {
    ...
}
-----------------------------------------
V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 353, 355. libwebp frame.c 353

void VP8ReconstructBlock(VP8Decoder* const dec) {
  ...
  if (dec->non_zero_ & (1 << n)) {
    VP8Transform(coeffs + n * 16, dst);
  } else if (dec->non_zero_ & (1 << n)) {
    VP8TransformDC(coeffs + n * 16, dst);
  }
  ...
}
-----------------------------------------
V501 There are identical sub-expressions 'sub->negNsSet->value' to the left and to the right of the '==' operator. libxml xmlschemas.c 13949

static int
xmlSchemaCheckCOSNSSubset(...)
{
  ...
  if ((sub->negNsSet != NULL) &&
      (super->negNsSet != NULL) &&
      (sub->negNsSet->value == sub->negNsSet->value))
  return 0;
 ...
}
-----------------------------------------
V501 There are identical sub-expressions 'ir1->operands [0]->type->is_matrix ()' to the left and to the right of the '||' operator. mesa ir_algebraic.cpp 189

bool
ir_algebraic_visitor::reassociate_constant(...)
{
  ...
  if (ir1->operands[0]->type->is_matrix() ||
      ir1->operands[0]->type->is_matrix() ||
      ir2->operands[1]->type->is_matrix() ||
      ir2->operands[1]->type->is_matrix())
  return false;
  ...
}
-----------------------------------------
V501 There are identical sub-expressions to the left and to the right of the '&&' operator: width > 0 && height > 0 && height > 0 mesa teximage.c 2801

void GLAPIENTRY
_mesa_TexSubImage3D(...)
{
  ...
  else if (width > 0 && height > 0 && height > 0) {
  ...
}
-----------------------------------------
V547 Expression 'input.len < 0' is always false. Unsigned type value is never < 0. nss pk11merge.c 491

struct SECItemStr {
  ...
  unsigned int len;
};

static SECStatus
pk11_mergeSecretKey(...)
{
  ...
  if (input.len < 0) {
    rv = SECFailure;
    goto done;
  }
  ...
}
-----------------------------------------
 V564 The '&' operator is applied to bool type value. You've probably forgotten to include parentheses or intended to use the '&&' operator. nss secasn1u.c 121

PRBool SEC_ASN1IsTemplateSimple(const SEC_ASN1Template *theTemplate)
{
  ...
  if (!theTemplate->kind & SEC_ASN1_CHOICE) {
    return PR_FALSE; /* no choice means not simple */
  }
  ...
}
-----------------------------------------
V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '+' operator. ots gdef.cc 278

bool version_2;

bool ots_gdef_parse(...)
{
  ...
  const unsigned gdef_header_end = static_cast<unsigned>(8) +
    gdef->version_2 ? static_cast<unsigned>(2) : static_cast<unsigned>(0);
  ...
}
-----------------------------------------
V501 There are identical sub-expressions 'kKeep_StencilOp == fFrontFailOp' to the left and to the right of the '&&' operator. skia grstencil.h 159

bool isDisabled() const {
  return kKeep_StencilOp == fFrontPassOp  &&
        kKeep_StencilOp == fBackPassOp    &&
        kKeep_StencilOp == fFrontFailOp  &&
        kKeep_StencilOp == fFrontFailOp  &&
        kAlways_StencilFunc == fFrontFunc &&
        kAlways_StencilFunc == fBackFunc;
}
-----------------------------------------
V501 There are identical sub-expressions 'x >= 0' to the left and to the right of the '&&' operator. webcore_platform feconvolvematrix.cpp 289

ALWAYS_INLINE int
FEConvolveMatrix::getPixelValue(PaintingData& paintingData, int x, int y)
{
  if (x >= 0 && x < paintingData.width && x >= 0 && y < paintingData.height)
    return (y * paintingData.width + x) << 2;
  ...
}
-----------------------------------------
V501 There are identical sub-expressions '(bStart >= aStart && bStart <= aEnd)' to the left and to the right of the '||' operator. webcore_remaining spatialnavigation.cpp 236

// This method checks if |start| and |dest| have a partial intersection, either
// horizontally or vertically.
// * a = Current focused node's rect.
// * b = Focus candidate node's rect.
static bool areRectsPartiallyAligned(FocusDirection direction, const IntRect& a, const IntRect& b)
{
    int aStart  = start(direction, a);
    int bStart  = start(direction, b);
    int bMiddle = middle(direction, b);
    int aEnd = end(direction, a);
    int bEnd = end(direction, b);

    // Picture of the partially aligned logic:
    //
    //    Horizontal      Vertical
    // ********************************
    // *  _            *  _ _ _      *
    // * |_|          *  |_|_|_|    *
    // * |_|.... _    *      . .    *
    // * |_|    |_|    *      . .    *
    // * |_|....|_|    *      ._._ _  *
    // *        |_|    *      |_|_|_| *
    // *        |_|    *              *
    // *              *              *
    // ********************************
    //
    // ... and variants of the above cases.
    return ((bStart >= aStart && bStart <= aEnd)
            || (bStart >= aStart && bStart <= aEnd)
            || (bEnd >= aStart && bEnd <= aEnd)
            || (bMiddle >= aStart && bMiddle <= aEnd)
            || (bEnd >= aStart && bEnd <= aEnd));
}
-----------------------------------------
V501 There are identical sub-expressions 'cy ().isRelative ()' to the left and to the right of the '||' operator. webcore_svg svgradialgradientelement.cpp 253

bool SVGRadialGradientElement::selfHasRelativeLengths() const
{
  return cy().isRelative()
      || cy().isRelative()
      || r().isRelative()
      || fx().isRelative()
      || fy().isRelative();
}


Karpov2007 13Oct2011 23:34

Re: PVS-Studio VS Chromium
 
PVS-Studio vs Chromium - Continuation:

http://www.viva64.com/en/b/0113/


All times are GMT +5.5. The time now is 03:17.