Compiling software solaris


















For complete information about using the C compiler, and the cc command and its options, see the Oracle Solaris Studio The OpenMP 3. The CC command invokes each of these components automatically unless you use command-line options to specify otherwise.

You can type CC -flags to see short descriptions of all the possible CC compiler options. Files whose names do not end with one of these suffixes are treated as object files or libraries and are handed over to the link editor. By default, the files are compiled and linked in the order given to produce an output file named a. C and produce an executable file named a. To compile the two source files test1.

C separately and then link them into an executable file called test :. The compiler can perform both automatic and explicit loop parallelization to enable your programs to run efficiently on multiprocessor systems.

The Fortran compiler offers compatibility with Fortran77, Fortran90, and Fortran95 standards. You can type f95 -flags to see short descriptions of all the possible compiler options.

The source file names must be one or more Fortran source file names ending in. F90 ,. F95 , or. For complete information about using the Fortran 95 compiler, and a description of the f95 command and its options, see the Oracle Solaris Studio Oracle Solaris Studio compilers provide significantly more information than other compilers to help you understand your code.

With optimization, the compilers insert commentary describing the transformations performed on the code, any obstacles to parallelization, operation counts for loop iterations, and so forth. The compiler commentary can be displayed in tools such as Performance Analyzer. Exit Print View. Search Scope:. This Document Entire Library. Documentation Home » Oracle Solaris Studio The machine representation looks the same as that of applications written in traditional languages.

The call stack shows JVM frames, native frames, and compiled-method frames. Some of the JVM frames represent transition code between interpreted Java, compiled Java, and native code.

Source from compiled methods are shown against the Java source; the data represents the specific instance of the compiled-method selected. Disassembly for compiled methods show the generated machine assembler code, not the Java bytecode. Caller-callee relationships show all overhead frames, and all frames representing the transitions between interpreted, compiled, and native methods.

The Timeline in the machine representation shows bars for all threads, LWPs, or CPUs, and the call stack in each is the machine-representation call stack. The specification, however, does not describe some implementation details that may be important to users, and the actual implementation from Oracle is such that directly recorded profiling information does not easily allow the user to understand how the threads interact.

When a subroutine contains a loop, the program executes the code inside the loop repeatedly until the loop exit criterion is reached. The execution then proceeds to the next sequence of code, and so forth.

When the program is parallelized with OpenMP or by autoparallelization , the behavior is different. An intuitive model of the parallelized program has the main, or master, thread executing just as a single-threaded program.

When it reaches a parallel loop or parallel region, additional slave threads appear, each a clone of the master thread, with all of them executing the contents of the loop or parallel region, in parallel, each for different chunks of work. When all chunks of work are completed, all the threads are synchronized, the slave threads disappear, and the master thread proceeds.

The actual behavior of the parallelized program is not so straightforward. When the compiler generates code for a parallel region or loop or any other OpenMP construct , the code inside it is extracted and made into an independent function, called an mfunction in the Oracle implementation. It may also be referred to as an outlined function, or a loop-body-function.

The name of the mfunction encodes the OpenMP construct type, the name of the function from which it was extracted, and the line number of the source line at which the construct appears. The names of these functions are shown in the Analyzer's Expert mode and Machine mode in the following form, where the name in brackets is the actual symbol-table name of the function:.

In the following discussion, all of these are referred to generically as parallel regions. Each thread executing the code within the parallel loop can invoke its mfunction multiple times, with each invocation doing a chunk of the work within the loop. When all the chunks of work are complete, each thread calls synchronization or reduction routines in the library; the master thread then continues, while the slave threads become idle, waiting for the master thread to enter the next parallel region.

All of the scheduling and synchronization are handled by calls to the OpenMP runtime. During its execution, the code within the parallel region might be doing a chunk of the work, or it might be synchronizing with other threads or picking up additional chunks of work to do. It might also call other functions, which may in turn call still others. A slave thread or the master thread executing within a parallel region, might itself, or from a function it calls, act as a master thread, and enter its own parallel region, giving rise to nested parallelism.

The Analyzer collects data based on statistical sampling of call stacks, and aggregates its data across all threads and shows metrics of performance based on the type of data collected, against functions, callers and callees, source lines, and instructions. The Analyzer presents information on the performance of OpenMP programs in one of three modes: User mode , Expert mode, and Machine mode.

The User mode presentation of the profile data attempts to present the information as if the program really executed according to the intuitive model described in Overview of OpenMP Software Execution.

The actual data, shown in the Machine mode, captures the implementation details of the runtime library, libmtsk. The Expert mode shows a mix of data altered to fit the model, and the actual data. In User mode, the presentation of profile data is altered to match the model better, and differs from the recorded data and Machine mode presentation in three ways:. Artificial functions are constructed representing the state of each thread from the point of view of the OpenMP runtime library.

Call stacks are manipulated to report data corresponding to the model of how the code runs, as described above. Two additional metrics of performance are constructed for clock-based profiling experiments, corresponding to time spent doing useful work and time spent waiting in the OpenMP runtime. Artificial functions are constructed and put onto the User mode and Expert mode call stacks reflecting events in which a thread was in some state within the OpenMP runtime library.

When a thread is in an OpenMP runtime state corresponding to one of the artificial functions, the artificial function is added as the leaf function on the stack. For OpenMP 3. The artificial function is replaced by an OpenMP Overhead metric. For OpenMP experiments, User mode shows reconstructed call stacks similar to those obtained when the program is compiled without OpenMP. The goal is to present profile data in a manner that matches the intuitive understanding of the program rather than showing all the details of the actual processing.

Time is accumulated in OpenMP Work whenever a thread is executing from the user code, whether in serial or parallel. Time is accumulated in OpenMP Wait whenever a thread is waiting for something before it can proceed, whether the wait is a busy-wait spin-wait , or sleeping. However, Expert view mode separately shows compiler-generated mfunctions that represent parallelized loops, tasks, and so on.

In User mode, these compiler-generated mfunctions are aggregated with user functions. Machine mode shows native call stacks for all threads and outline functions generated by the compiler. The real call stacks of the program during various phases of execution are quite different from the ones portrayed above in the intuitive model. The Machine mode shows the call stacks as measured, with no transformations done, and no artificial functions constructed.

The clock-profiling metrics are, however, still shown. In each of the call stacks below, libmtsk represents one or more frames in the call stack within the OpenMP runtime library. The details of which functions appear and in which order change from release to release of OpenMP, as does the internal implementation of code for a barrier, or to perform a reduction.

Before the first parallel region is entered, there is only the one thread, the master thread. The call stack is identical to that in User mode and Expert mode. The calls to foo-OMP Unlike when the threads are executing in the parallel region, when the threads are waiting at a barrier there are no frames from the OpenMP runtime between foo and the parallel region code, foo-OMP The reason is that the real execution does not include the OMP parallel region function, but the OpenMP runtime manipulates registers so that the stack unwind shows a call from the last-executed parallel region function to the runtime barrier code.

Without it, there would be no way to determine which parallel region is related to the barrier call in Machine mode. If the stack has been corrupted by the user code; if so, the program might core dump, or the data collection code might core dump, depending on exactly how the stack was corrupted.

If the user code does not follow the standard ABI conventions for function calls. If the call stack contains more than about frames, the Collector does not have the space to completely unwind the call stack. If you generate intermediate files using the -E or -P compiler options, the Analyzer uses the intermediate file for annotated source code, not the original source file.

The line directives generated with -E can cause problems in the assignment of metrics to source lines. The following line appears in annotated source if there are instructions from a function that do not have line numbers referring to the source file that was compiled to generate the function:.

The debugging information was stripped after compilation, or the executables or object files that contain the information are moved or deleted or subsequently modified. The function contains code that was generated from include files rather than from the original source file.



bowcatafe1973's Ownd

0コメント

  • 1000 / 1000