Ne-XVP

Ne-XVP was a research project executed between 2006-2008 at NXP Semiconductors. The project undertook a holistic approach to define a next generation multimedia processing architecture for embedded MPSoCs that targets programmability, performance scalability, and silicon efficiency in an evolutionary way. The evolutionary way implies using existing processor cores such as NXP TriMedia as building blocks and supporting industry programming standards such as POSIX threads. Based on the technology-aware design space exploration, the project concluded that hardware accelerators facilitating task management and coherency coupled with right dimensioning of compute cores deliver good programmability, scalable performance and competitive silicon efficiency.

Research

Ne-XVP architecture at the end of 2008. Two different core types core1 and core2 are used to construct a multicore processor. To increase performance density the multicore is supported by several accelerators for inter-thread synchronization and communication. For example, the Hardware Task Scheduler can schedule tasks for many complex multimedia applications, and the cache coherence coprocessors enable inter-thread communication via shared memory.

Ne-XVP's research subjects and corresponding publications:

Asymmetric multicore architecture with generic accelerators ^[1]
Hardware multithreading in VLIWs ^[2]
Low-complexity cache coherence ^[1]
Hardware accelerators for task scheduling and synchronization:
1. A Hardware Task Scheduler ^[3]
2. Hardware Synchronization Unit to sync threads ^[1]^[2]
3. Task Management Unit ^[4]
Instruction cache sharing ^[1]
Design Space Exploration with Performance Density as the optimization function ^[1]
Technology modeling for embedded processors ^[1]^[5]^[6]
Parallelization of complex multimedia algorithms (H.264, Frame Rate Conversion) ^[7]^[8]^[9]^[10]
Auto-parallelizing compilers
Time-aware programming languages in cooperation with the ACOTES project ^[11]
Visual programming
Task-level speculation
Porting GCC to Exposed Pipeline VLIW Processors ^[12]
Multiprogram workload for embedded processing
A 1-GHz embedded VLIW processor

Project members

Ne-XVP team at the end of 2008. (left-to-right, top-to-bottom) Surendra Guntur, Jan Hoogerbrugge, Ghiath Al-Kadi, Marc Duranton, Andrei Terechko, Anirban Lahiri.

Ghiath Al-Kadi
Zbigniew Chamski
Dmitry Cheresiz
Marc Duranton (project leader)
Surendra Guntur
Jan Hoogerbrugge
Anirban Lahiri
Ondrej Popp
Andrei Terechko
Alex Turjan
Clemens Wust
...

References

↑ 1.0 1.1 1.2 1.3 1.4 1.5 A. Terechko, J. Hoogerbrugge, G. Alkadi; S. Guntur; A. Lahiri; M. Duranton; C. Wust; P. Christie; A. Nackaerts; A. Kumar, "Balancing programmability and silicon efficiency of heterogeneous multicore architectures", ACM Transactions on Embedded Computing Systems, Special Issue on Real-time Multimedia, 2010.
↑ 2.0 2.1 J. Hoogerbrugge, A. Terechko, "A multithreaded multicore system for embedded media processing", Transactions on High-Performance Embedded Architectures and Compilers, Volume 4, Issue 2, 2008.
↑ G. Al-Kadi, A.S. Terechko, "A Hardware Task Scheduler for Embedded Video Processing", in Proceedings of the 4th International Conference on High Performance and Embedded Architectures and Compilers, Paphos, Cyprus, January 25–28, 2009.
↑ M. Sjalander, A. Terechko, M. Duranton; A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures; Proceedings of the 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools; Pages 149-157; 2008; ISBN 978-0-7695-3277-6; IEEE Computer Society Washington, DC, USA.
↑ A. Terechko, J. Hoogerbrugge; G. Al-Kadi; A. Lahiri; S. Guntur; M. Duranton; P. Christie; A. Nackaerts; A. Kumar, “Performance Density Exploration of Heterogeneous Multicore Architectures”, invited presentation at Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO’09), January 25, 2009, in conjunction with the 4th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Paphos, Cyprus, January 25–28, 2009.
↑ P. Christie, A. Nackaerts, A. Kumar, A. S. Terechko, G. Doornbos, “Rapid Design Flows for Advanced Technology Pathfinding”, invited paper, International Electron Devices Meeting, San Francisco, 2008.
↑ G. Al-Kadi, J. Hoogerbrugge, S. Guntur, A. Terechko, M. Duranton, “Meandering based parallel 3DRS algorithm for the multicore era”, in IEEE International Conference on Consumer Electronics, Las Vegas, USA, January 11–13, 2010.
↑ A. Azevedo, B. Juurlink, C. Meenderinck, A. Terechko, J. Hoogerbrugge, M. Alvarez, A. Ramirez, M. Valero, “A Highly Scalable Parallel Implementation of H.264”, in Transactions on High-Performance Embedded Architectures and Compilers, Volume 4, Issue 2, pp. 404-418, 2009.
↑ A. Azevedo, C. Meenderinck, B. Juurlink, A. Terechko, J. Hoogerbrugge, M. Alvarez, A. Ramirez, "Parallel H.264 Decoding on an Embedded Multicore Processor", in Proceedings of the 4th International Conference on High Performance and Embedded Architectures and Compilers, Paphos, Cyprus, January 2009.
↑ M. Alvarez, A. Azevedo, C. Meenderinck, B. Juurlink, A. Terechko, J. Hoogerbrugge, A. Ramirez, "Analyzing Scalability Limits of H.264 Decoding Due to TLP Overhead", in Proceedings of 6th HiPEAC Industrial Workshop, November 2008.
↑ ACOTES: http://www.hitech-projects.com/euprojects/ACOTES/
↑ A. Turjan, D. Cheresiz, "Porting GCC to an exposed pipeline vector VLIW processor", GCC Developer's summit, Montreal, Québec, Canada, June 8–10, 2009.