computer organisation and architecture homework help

Boost your journey with 24/7 access to skilled experts, offering unmatched computer organisation and architecture homework help

Trusted by 1.1 M+ Happy Students

4.4Trust Pilot

4.4Edu Reviewer

5App Review

4.8Student

WhatsApp Support

Get Instant
Online Homework Help
via WhatsApp

Get instant homework help from top tutors—just a WhatsApp message away. 24/7 hw help support for all your academic needs!

★★★★★

2M+ students trust TutorBin

Your WhatsApp Number

⚡ Instant reply

🔒 100% private

👨‍🏫 Top tutors

🌍 All subjects

^*Get instant homework help from top tutors—just a WhatsApp message away. 24/7 support for all your academic needs!

2M+ Students Helped24/7 Live SupportExpert TutorsAll Subjects CoveredInstant Response100% ConfidentialTop Rated ServiceMoney-back Guarantee2M+ Students Helped24/7 Live SupportExpert TutorsAll Subjects CoveredInstant Response100% ConfidentialTop Rated ServiceMoney-back Guarantee

Recently Asked computer organisation and architecture Questions

Expert help when you need it

Q1:Operations on vectors with length < 64 elements require: a. The Exception Program Counter. WRONG!! b.None of the choices. c. Multi-lane convergence d. Branch instructions and some arithmetic instructions. e. Loop statements. f. The Vector Length Register.See Answer
Q2:Consider the following loop: for (i=0; i<64; i++) X[i] = a* x[i] + b; Here is the assembly code for the loop. Assume that prior to the loop, i is in R1, 64*4 is in R2, a is in FO, and b is in F2. 10: Id F4, X(R1) 11: mul F4, F4, FO 12: add F4, F4, F2 13: st F4, X(R1) 14: addi R1, R1, #4 15: bne R1, R2, 10 Consider a processor with 32-element vector processor. The processor has four fully-pipelined vector execution units: a 2-cycle load unit, a 2-cycle store unit, a 2- cycle FP adder, and a 4-cycle FP multiplier. The vector processor does not supports chaining. How long would it take the original loop to execute on this processor? You ignore overlaps between multiple vector chains and the scalar code for setting up each group of vector operations. a.200 cycles b.48 cycles c.148 cycles d.32 cycles e.74 cycles f.120 cycles g.100 cycles h.52 cycles i.150 cyclesSee Answer
Q3: Let assume that the vector processor supports chaining. How long would it take the original loop to execute on this processor? a.52 cycles b.148 cycles c.74 cycles wrong!! d.200 cycles wrong!! e.32 cycles f.48 cycles g.120 cycles h.100 cycles i.150 cyclesSee Answer
Q4:Consider the following loop: for (i=0; i<64; i++) X[i] = a* x[i] + b; Here is the assembly code for the loop. Assume that prior to the loop, i is in R1, 64*4 is in R2, a is in FO, and b is in F2. 10: Id F4, X(R1) 11: mul F4, F4, FO 12: add F4, F4, F2 13: st F4, X(R1) 14: addi R1, R1, #4 15: bne R1, R2, 10 Consider a processor with 32-element vector processor. The processor has four fully-pipelined vector execution units: a 2-cycle load unit, a 2-cycle store unit, a 2- cycle FP adder, and a 4-cycle FP multiplier. The vector processor does not supports chaining. How long would it take the original loop to execute on this processor? You ignore overlaps between multiple vector chains and the scalar code for setting up each group of vector operations. a.200 cycles b.48 cycles c.148 cycles d.32 cycles e.74 cycles f.120 cycles g.100 cycles h.52 cycles i.150 cyclesSee Answer
Q5: Let assume that the vector processor supports chaining. How long would it take the original loop to execute on this processor? a.52 cycles b.148 cycles c.74 cycles wrong!! d.200 cycles wrong!! e.32 cycles f.48 cycles g.120 cycles h.100 cycles i.150 cyclesSee Answer
Q6:Design and develop a piece of software which interacts directly with computer hardware, including parallel architectures. You are required to deliver a software solution with a report (1500 words). You should ensure the following are included in your development (this list is not exhaustive): Part A (Design, implement and evaluate programs): • You can select an application of your choice and parallelize it. (Ex. image filtering, discrete wavelet transform, matrix multiplication, discrete cosine transforms, etc.) • You can use any programming language (Python, Java, C/C++, etc.) with which you are conversant and submit your source code. • You are free to use any hardware (CPU, GPU, or APU) • You are free to use any Operating system (Linux, Windows, etc.). • You can use external libraries such as OpenMP, OpenCL, CU DA, etc. Part B (Report-1500 words): You are required to submit a report of about 1500 words along with the code (both sequential version and parallel version). Also, you need to provide a demo video/presentation of the working of your code. Your report should contain at least the following information: • Summary or Introduction • Programming language and hardware details: In this section, you should include details about programming language and hardware. Also, this is a section to mention external libraries.See Answer
Q7:8) For a vector addition, assume that the vector length is 8000, each thread calculates one output element, and the thread block size is 1024 threads. The programmer configures the kernel launch to have a minimal number of thread blocks to cover all output elements. How many threads will be in the grid? (a) 8000 (b) 8196 (c) 8192 (d) 8200See Answer
Q8:9) Assume that a kernel is launched with 1000 thread blocks each of which has 512 threads. if a variable is declared as a shared memory variable, how many versions of the variable will be created through the lifetime of the execution of the kernel? (al (b) 1000 (c) 512 (d) 512000See Answer
Q9:Q5. [5 Pts: 2, 3 pts, 10 minutes] 1. Write a CUDA kernel incrementing a float array A of size N for a 1D grid, using ID thread blocks, and assuming that each thread increments one element. } Solution: _global____ void increment_On_Device(float *A_d, int N) {See Answer
Q10:06. (6 P: 3 each, 20 minutes) Consider the following array: (4 6 7 1 28 5 21. a. Perform a parallel inclusive prefix scan on the array using the Kogge-Stone algorithm. Report the intermediate states of the array after each step. How many add operations? Solution: b. Repeat using the work-efficient algorithm (Brent-Kung). Solution:See Answer
Q11:07.13 Pts, 10 minutes] Assume we have matrix of 80 by 100 and we want to cover it to index it with a thread blocks of size 32 by 32. Analyze this case and what is the performance impact of divergent warps? Solution:See Answer
Q12:Q2. (8 Pts: 1 pt. each, 15 minutes] Indicate whether the following statements are true or false, device qualifier may be called on the host or the 1. [] Functions annotated with the device 2. [ ] Page faults cannot be handled by software because the overhead is too large. 3. [ ] Virtual memory space has to be bigger than the physical memory space. 4. [] You can have a miss in the TLB, a hit in the page table, and a miss in the cache for a single memory access. 5. [ ] Shared memory in CUDA is accessible to both the host and GPU 6. [] In the case of warp divergence; all possible execution paths are run by all threads in a warp serially so that thread instructions do not diverge. 7. ] All thread blocks involved in the same computation use the same kernel 8. [] Is it possible to multiply two 1024X1024 matrices using a tiled matrix multiplication code with 1,024 thread blocks on a device of block size of 512 threads. Note that each thread in a thread block calculates one element of the result matrix.See Answer
Q13:ts 40 Here is a block diagram for the register file: 32X64 WP Register File Ra Rb Rw Bus W Cik Bus A Upload Choose a File Bus B RegWr Save your new register file as RegisterFile.v. Test your code against the provided testbench to make sure it is working (If you are not working on Vivado, you may need include "RegisterFile.v" command). Demo for your TA. Attach a zip/tar file containing your completed module along with a screenshot of the waveform. registerfile tb.v↓/n2023 cements 5 ents ons ro 1.3 40 Question 1 Implement a 32x64 register file (32 registers; each register is 64-bit wide). Below is a specification of the register file: • Inputs Ra and Rb are read register indices. Input Ra indexes the register whose value is on BusA, and input Rb indexes the register whose value is on BusB. • Input Rw is the write register index. • When RegWr is high, the data on Bus W is stored in the register specified by Rw, on the negative (falling) clock edge. Register reading should occur after the register write (on the negative clock edge), but before the positive clock edge. • Register 31 must always read zero, even if it has not been written to. • The Register File module should have the following interface: ●. module RegisterFile (BusA, BusB, BusW, RA, RB, RW, RegWr, Clk); output [63:0] BUSA; output [63:0] BusB; input [63:0] BusW; input [4:0] RA; input [4:0] RB; input [4:0] RW; input RegWr; input clk; reg [63:0] registers [31:0]; Here is a block diagram for the register file: Ra Bb 20 pts ----- Bus ASee Answer
Q14:Summer.2023 me labus nouncements dules signments ades scussions om Pro 1.3 40 Use the following module interface (which you can find in the provided file): module NextPCLogic(NextPC, CurrentPC, SignExtImm64, Branch, ALUZero, Uncondbranch); input [63:0] CurrentPC, SignExtImm64; input Branch, ALUZero, Uncondbranch; output [63:0] NextPC; reg [63:0] tmp; /* write your code here */ endmodule Branch (CBZ) is true if the current instruction is a conditional branch instruction, Uncondbranch is true if the current instruction is an Unconditional Branch (B), and ALUZero is the Zero output of the ALU. A template and testbench are provided (If you work on Vivado, comment out the include command at the very top). Complete the next pc logic; add a few test cases to the testbench to improve it. Demonstrate your program to the TA. Attach a zip/tar file containing your completed module, testbench and a screenshot of the waveform. NextPClogic.v↓↓ NextPClogicTest.v Upload Choose a File/ny mer.2023 IS ncements es ments S Electrical and Co... sions Pro 1.3 40 Home | Howdy Question 2 Outlook PC Add WebAssign-LOG... C Get Homework He... Write a behavior model to calculate the next PC for an instruction. It will use information from the processor control module and the ALU to determine the destination for the next PC. It will contain two adders for calculating the two possible NextPC and choose between the two possibilities using the logic depicted below. memory Regi:00 studion Dif [Parucion H - Uncondbranch Branch ""!! data Reg Netflix ▸YouTube M Gmail. Shift left 2 Add ALU result 20 pts Zero 2 MapsSee Answer
Q15:Description of the IS Architecture, Software and Database Components, and Hardware Architecture The information system designed for this case should connect docking stations, as well as manage user registrations and billing. We learned about different architecture models (e.g. centralized, distributed, and cloud). Directions In a Word document, provide the following information: Propose the IS architecture for the IS for the bike sharing system. Explain the rationale behind the choice List all the software, and database, and hardware components; Draw a architecture of the system showing the components using Word shapes, use arrows to show business process and major data flow between them. APA Format, 550 words, Double spacing. References will not be count in 550 words.See Answer
Q16:You have the following circuit: A C- محمد D- 1) Code it using behavioral modeling. Show a screenshot of your code. -Y 2) Write a testbench for the code of Part 1. Show screenshots of the code and the waveform, clearly showing the results 3) Code it using structural modeling. Show a screenshot of your code. 4) Write a testbench for the code of Part 3. Show screenshots of the code and the waveform, clearly showing the result. use EDA playground web to do this assignment | https://www.edaplayground.com/See Answer
Q17:CS370 Lab Assignment #2 Goal: The purpose of this project is to demonstrate the use of Boolean logic and build basic circuits. Note that this lab assignment is group assignment with each group having either 2 students or 3 students. Problem Statement: The predominant storage inside computer systems are on disks drives. SCSI disks are the standard disks in most Unix Workstations from Sun, HP, SGI, and other vendors. They are also the standard disks in Macintoshes and Higher-end Intel PC's, especially network servers. Consider the Wide Ultra4 SCSI which transfers data packets in 16-bit bursts at 160 MHz with a maximum throughput of 320 MB/sec. The data transfers at higher rates can result in random-noise pulse changes from a 0 to 1 and 1 to a 0. As the speed of processors and electronic communications increases, these parity flips become more prevalent and the inability to detect when these errors occur can be fatal. As a Design Engineer you have been requested to create system for the transmission of these 8-bit packets from an I/O Controller to Memory using Error Correcting Code over 12-bit data bus line. Wide SCSI contains a 68bit bus; however for the sake of simplification we are only concerned with the data bits. The other bits in the SCSI bus are for bus arbitration, synchronization, power management, etc. In this project, we will use even parity. Hex Displays Memory I/O Controller Transmission Vectored Bit: A 4-Bit Parity Vector (P₁-P4) are interlaced with the 8-bit Data Vector (D₁ D₂): P₁ P₂ D₁ P3 D₂ D3 D4 P4 Ds Dg D7 Ds 1) Create an ECC Generator, at the I/O Controller from the 8-bit Data Vector. The output of the ECC Generator will be the 4-Bit Parity Vector. 2) Construct a 12-bit Data Transmission bus to send the binary data and parity bits over to Memory. 3) Construct an ECC Detector at Main Memory that corrects for single bit errors. Generally, an interrupt/error handler is used to handle errors from the OS, for this exercise we will use 3 Hex displays, 2 for data and 1 for an error status, for diagnostic purposes: In the event that no error has occurred, your design must display the data transferred using the 2 Hex data displays and a "0" as an error status. . For single bit transmission errors, your system must correct the error and display the data along with "C" in the 3rd Hex Display. For multiple bit transmission errors, your design must display "E" in the error status display.See Answer
Q18:3- A MOV assembly instruction copies the value of the source register to the destination register. What is the value of the destination register r1 after the following instruction is executed? Memory Address 0x08000166 *** Assembly Instruction MOV rl, pc ***See Answer
Q19:INSTRUCTIONS Create an abstract-Word Limit Is 650-700 Please provide information of the project as follows: It should have a title. What is the project all about? Why do you want to do this specific as a project? What will you implement? What tools needed to implement it? Formal References. Note : Can Work On Any IEEE Related PaperSee Answer
Q20:3. Step: 3a - The heart of the mini-project. VM24 Only - Due Sunday, 4/7 by 11:59pm Design the microarchitecture interpreter for executing/interpreting the above macro-program by: a. First, design the virtual hardware (Level 1) - CPU - for executing each micro-architecture instruction based on the Instruction Set given to you below. The design of your virtual machine (CPU) would be a pseudocode in C, C++, or Java programming language. Your virtual machine should include all the necessary registers and other data structures needed for the task. (See the instruction set and instruction format in section 4, and the types of special and general-purpose registers referenced in the above assembly code.) b. The OS machine level (sitting on top of your ISA-level) would be a driver', a pseudocode of the master program, that simply allocates space (RAM memory) for the various parts of the program listed in section 2 above. (The OSML doesn't accomplish much, at this point - no fancy memory management or scheduling, or synchronization - since you have only one program/job's execution to simulate.) c. All the variables, registers, etc. in the driver and the virtual CPU define the state of the VM24 computer. For example, the PC is part of the state. Therefore, it is instructive to define a data structure, called PCB (process control block) to maintain the state of the computer. The following schematic, for the VM24 architecture, is a simple guide for you to design and implement each component of the architecture. Note: You have a choice between C, C++, or Java for the pseudocode. The focus is on the OS driver, the CPU and its components, the RAM (skip the 'Disk') and have the Loader load the 'microprogram' into the RAM. [Note: the 32-bit microinstructions will assist in the instruction 'Decode' stage.] Step 2b above should guide you in getting the Decoder's logic right. The Execute will serve as the ALU (made of functions', or opcodes) which will 'interpret' each instruction of the program listed in the Assembly/Hex form. The Long-Term and the Short-Term/Dispatch components are not needed, except the PCB which carries the 'state' of the VM24. There will be no "Context-Switching in this exercise. C Pr/nDecode (info) (0) Fech CPU PC+1 PC=0 PC РСВ 1d 8-23 State 8 19 080 OS driver Scheduler 10 Execute (para) Opcode...... [memory (info)] Program File 00000 Long scheduler disk Loader Effective_addr) //JOB 1EB 2C15212 25 words ABCDEFA person Instruct data 02C12345 Data BCC 2 DATA portion END Job #2 JOB 2 CF 19 (: 3 2048 job1 job 2 0-18 (memory) RAM 2 1 1024 8 PCBi Short Scheduler dispatcher Context Switch RQ 2 1 4-byte word (8 hex-chars) copy 16 PCBI CPUI г2 General Purpose Cpu reg PCREG register Г1 PC2See Answer

TutorBin Testimonials

I found TutorBin Computer Organisation And Architecture homework help when I was struggling with complex concepts. Experts provided step-wise explanations and examples to help me understand concepts clearly.

Rick Jordon

TutorBin experts resolve your doubts without making you wait for long. Their experts are responsive & available 24/7 whenever you need Computer Organisation And Architecture subject guidance.

Andrea Jacobs

I trust TutorBin for assisting me in completing Computer Organisation And Architecture assignments with quality and 100% accuracy. Experts are polite, listen to my problems, and have extensive experience in their domain.

Lilian King

I got my Computer Organisation And Architecture homework done on time. My assignment is proofread and edited by professionals. Got zero plagiarism as experts developed my assignment from scratch. Feel relieved and super excited.

Joey Dip

Popular Subjects for computer organisation and architecture

You can get the best rated step-by-step problem explanations from 65000+ expert tutors by ordering TutorBin computer organisation and architecture homework help.

TutorBin helping students around the globe

TutorBin believes that distance should never be a barrier to learning. Over 500000+ orders and 100000+ happy customers explain TutorBin has become the name that keeps learning fun in the UK, USA, Canada, Australia, Singapore, and UAE.