parsing an .exe file

Discussion in 'Assembly Language Programming (ALP) Forum' started by waisty, Dec 3, 2006.

  1. waisty

    waisty New Member

    Joined:
    Dec 3, 2006
    Messages:
    12
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    student
    hi. i'm new on this forum, so please pardon me if i break any rules. I'm a computer science final year student working on my project, and a small part of it requires that i parse an .exe file (using assembly or C), to know what methods/procedures are used in the program. I've searched everywhere for it, to no avail. On a thread(not posted by me) on another site, it was considered a prank post, as it seemed impossible to the members. I assure you, this is no prank, and any replies would be highly appreciated. Thanks -- waisty
     
  2. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,375
    Likes Received:
    388
    Trophy Points:
    83
  3. waisty

    waisty New Member

    Joined:
    Dec 3, 2006
    Messages:
    12
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    student
    Thanks, shabbir, for your reply. although the code works perfectly for changing the icon of an .exe file, it doesn't really help with actual parsing. what i want to do, is to parse the .exe file to check the internal structure. More precisely, I need to know where in the file each module(method) starts and stops, and where each segment starts and stops. Any help would be greatly appreciated.
     
  4. DaWei

    DaWei New Member

    Joined:
    Dec 6, 2006
    Messages:
    835
    Likes Received:
    5
    Trophy Points:
    0
    Occupation:
    Semi-retired EE
    Location:
    Texan now in Central NY
    Home Page:
    http://www.daweidesigns.com
    Have you investigated the formats for the various kinds of .exe files? There are a few. Not all formats carry as extensive a set of information as others. None that I know of will give you the location of every procedure/method/function. Some will give you the location of imported modules/names. I would suggest that you review your assignment to make sure you have interpreted it correctly.
     
  5. waisty

    waisty New Member

    Joined:
    Dec 3, 2006
    Messages:
    12
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    student
    I have done some investigation on .exe file format, and i know that none of the formats will tell you the location of every procedure. But i do know that there must be some line of machine code that signifies the beginning and end of a procedure. Also, its not an assignment, its kind of a final year project I chose for myself. Thanks for your continued help
     
  6. DaWei

    DaWei New Member

    Joined:
    Dec 6, 2006
    Messages:
    835
    Likes Received:
    5
    Trophy Points:
    0
    Occupation:
    Semi-retired EE
    Location:
    Texan now in Central NY
    Home Page:
    http://www.daweidesigns.com
    Actually, there is not a line of machine code that signifies the beginning of a procedure. The code that signifies the return procedure might or might not be at the end, and it might occur more than once. A procedure is called by saving the current point of execution (wherever it may be) and setting the instruction pointer to the value representing the start of the procedure. The start of the procedure could be any set of values at all. If you thought that you could find all calls, then you could interpret the following address and infer the location of the procedure. Unfortunately, the location is a relative value, generally, in an exe file.

    If you knew exactly which language, and which compiler of that language, produced the code, then you could presume some fairly standard overhead code, and look for that. Nothing in the world prevents blocks of data from containing those same values, however. The total effectiveness of such a process, for all exes, would probably border on crap.

    Since I'm having to point all these things out, you are apparently somewhat of a novice (not a derogatory comment, just a presumption). It's possible that you've let your ambition overload your abilities, at this point.
     
  7. waisty

    waisty New Member

    Joined:
    Dec 3, 2006
    Messages:
    12
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    student
    Thanks for your last reply. Actually, I am kind of a novice to assembly language programming, but I have quite a bit of experience in high level programming. You stated in your last reply that a procedure is called by saving the current point of execution and setting the instruction pointer to the value representing the start of the procedure. For each type of machine, there must be a specific machine code(or set of codes) that does the calling, and another set of machine codes for returning. Once i can get these machine codes for the different machines, then I can easily parse the .exe for the information i want. Since disassemblers do this all the time, it must be possible. Thanks again for your previous reply and continued help
     
  8. DaWei

    DaWei New Member

    Joined:
    Dec 6, 2006
    Messages:
    835
    Likes Received:
    5
    Trophy Points:
    0
    Occupation:
    Semi-retired EE
    Location:
    Texan now in Central NY
    Home Page:
    http://www.daweidesigns.com
    The problem is that even the best disassemblers screw up. The extent of this screwup is determined by the extent to which non-code (inline data) is allowed to exist in the same area as the code.

    Suppose, also, that the call instruction is represented by C3 nn nn, where nn nn is the address of the procedure to be called. Suppose, further, that some instruction wants to load a register with C3. You find this C3 by scanning the code. The instruction that follows would be interpreted as nn nn, rather than an instruction. The best that you can hope for is that screwups would amount to less than xx%, where xx is some value of failure which you consider acceptable. I would question quite severely your evaluation of an acceptable xx. If for no other reason, just to see you sweat and attempt to explain your conclusions.
     
  9. waisty

    waisty New Member

    Joined:
    Dec 3, 2006
    Messages:
    12
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    student
    Well, for every instruction code for a specific machine, there is a specific length in bytes that the instruction uses. With this information, it would be easy for one to know where each instruction stops, and where the next one starts. So, for the case of C3 nn nn and xx C3, if the file is properly parsed, it would be easy to know if the C3 in question was an instruction, or an operand. So, if properly parsed, there should be no error. However, in the case where there may be errors, a seemingly high error would be tolerable, because, I am trying to reconstruct virus infected files, which would otherwise have to be deleted. Thanks for your help. And I look forward to your next reply.
     
  10. DaWei

    DaWei New Member

    Joined:
    Dec 6, 2006
    Messages:
    835
    Likes Received:
    5
    Trophy Points:
    0
    Occupation:
    Semi-retired EE
    Location:
    Texan now in Central NY
    Home Page:
    http://www.daweidesigns.com
    Again, you're presuming that there is no data embedded in the code. Since you are so up on the ways to do it, however, I don't understand why you are posting to find an answer. Just write your disassembler, parse for the requisite codes (be sure to distinguish between absolute and relative calls), and turn in your project.
     
  11. waisty

    waisty New Member

    Joined:
    Dec 3, 2006
    Messages:
    12
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    student
    Thanks for the advice. I'll get to work on it. But as a newbie, I don't know how to distinguish between absolute and relative calls. Are there different instructions for the two?. Thanks for your help
     
  12. waisty

    waisty New Member

    Joined:
    Dec 3, 2006
    Messages:
    12
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    student
    What does it mean to have data embedded in code. Can you have a db or a dw in the .code segment?
    And how do I distinguish between absolute and relative calls? Are there different opcodes for the two?
    And I also need to know which segment is which, so that I don't go parsing the stack segment or something like that. And what do I do in cases where there is more than one .code segment? Thanks for all your help
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice