Data Processing in Shell

بواسطة: DataCamp

Overview

Learn powerful command-line skills to download, process, and transform data, including machine learning pipeline.

We live in a busy world with tight deadlines. As a result, we fall back on what is familiar and easy, favoring GUI interfaces like Anaconda and RStudio. However, taking the time to learn data analysis on the command line is a great long-term investment because it makes us stronger and more productive data people.




In this course, we will take a practical approach to learn simple, powerful, and data-specific command-line skills. Using publicly available Spotify datasets, we will learn how to download, process, clean, and transform data, all via the command line. We will also learn advanced techniques such as command-line based SQL database operations. Finally, we will combine the powers of command line and Python to build a data pipeline for automating a predictive model.

Syllabus

Downloading Data on the Command Line
-In this chapter, we learn how to download data files from web servers via the command line. In the process, we also learn about documentation manuals, option flags, and multi-file processing.

Data Cleaning and Munging on the Command Line
-We continue our data journey from data downloading to data processing. In this chapter, we utilize the command line library csvkit to convert, preview, filter and manipulate files to prepare our data for further analyses.

Database Operations on the Command Line
-In this chapter, we dig deeper into all that csvkit library has to offer. In particular, we focus on database operations we can do on the command line, including table creation, data pull, and various ETL transformation.

Data Pipeline on the Command Line
-In the last chapter, we bridge the connection between command line and other data science languages and learn how they can work together. Using Python as a case study, we learn to execute Python on the command line, to install dependencies using the package manager pip, and to build an entire model pipeline using the command line.

Taught by

Susan Sun

Data Processing in Shell
الذهاب الي الدورة

Data Processing in Shell

بواسطة: DataCamp

  • DataCamp
  • مدفوعة
  • الإنجليزية
  • متاح شهادة
  • متاح في أي وقت
  • الجميع
  • N/A
8.1.2PHP Version921msRequest Duration2MBMemory UsageGET ar/الدورات/{slug}Route
    • Booting (599ms)
    • Application (319ms)
    • 1 x Booting (65.11%)
      599.32ms
      1 x Application (34.65%)
      318.93ms
      14 templates were rendered
      • public.courses.show (resources/views/public/courses/show.blade.php)3bladefile
        Params
        0
        course
        1
        links
        2
        config
      • public.courses.partials.breadcrumbs (resources/views/public/courses/partials/breadcrumbs.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.heading (resources/views/public/courses/partials/heading.blade.php)7bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        classes
      • public.courses.partials.details (resources/views/public/courses/partials/details.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.breadcrumbs (resources/views/public/courses/partials/breadcrumbs.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.courses.partials.heading (resources/views/public/courses/partials/heading.blade.php)7bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        classes
      • public.layouts.main (resources/views/public/layouts/main.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.layouts.partials.meta (resources/views/public/layouts/partials/meta.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.layouts.partials.navbar (resources/views/public/layouts/partials/navbar.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.auth.profile.partials.links (resources/views/public/auth/profile/partials/links.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.auth.profile.partials.link (resources/views/public/auth/profile/partials/link.blade.php)8bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
        6
        route
        7
        title
      • public.layouts.partials.flash-session (resources/views/public/layouts/partials/flash-session.blade.php)6bladefile
        Params
        0
        __env
        1
        app
        2
        errors
        3
        course
        4
        links
        5
        config
      uri
      GET ar/الدورات/{slug}
      middleware
      web, localize:ar
      controller
      App\Http\Controllers\CourseController@show
      as
      ar.courses.show
      namespace
      prefix
      /ar
      where
      file
      app/Http/Controllers/CourseController.php:17-35
      6 statements were executed12.57ms
      • select * from `courses` where `slug_ar` = 'data-processing-in-shell' limit 1
        11.1ms/app/Http/Controllers/CourseController.php:20corspedia
        Metadata
        Bindings
        • 0. data-processing-in-shell
        Backtrace
        • 17. /app/Http/Controllers/CourseController.php:20
        • 18. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 19. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 20. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • update `courses` set `visitors` = `visitors` + 1, `courses`.`updated_at` = '2025-06-10 02:42:04' where `id` = 3877
        580μs/app/Http/Controllers/CourseController.php:21corspedia
        Metadata
        Bindings
        • 0. 2025-06-10 02:42:04
        • 1. 3877
        Backtrace
        • 17. /app/Http/Controllers/CourseController.php:21
        • 18. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 19. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 20. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select `id`, `name_en`, `name_ar`, `topic_id`, `slug_en`, `slug_ar` from `subjects` where `subjects`.`id` in (44)
        230μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select `id`, `name_en`, `name_ar`, `slug_en`, `slug_ar` from `topics` where `topics`.`id` in (1)
        170μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 25. /app/Http/Controllers/CourseController.php:23
        • 26. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 27. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 28. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 29. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `providers` where `providers`.`id` in (58) and `providers`.`deleted_at` is null
        200μs/app/Http/Controllers/CourseController.php:23corspedia
        Metadata
        Backtrace
        • 20. /app/Http/Controllers/CourseController.php:23
        • 21. /vendor/laravel/framework/src/Illuminate/Routing/Controller.php:54
        • 22. /vendor/laravel/framework/src/Illuminate/Routing/ControllerDispatcher.php:43
        • 23. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:260
        • 24. /vendor/laravel/framework/src/Illuminate/Routing/Route.php:205
      • select * from `html_files` where `html_files`.`id` = 3868 limit 1
        290μs/app/Models/Course.php:84corspedia
        Metadata
        Bindings
        • 0. 3868
        Backtrace
        • 21. /app/Models/Course.php:84
        • 28. view::public.courses.show:29
        • 30. /vendor/laravel/framework/src/Illuminate/Filesystem/Filesystem.php:125
        • 31. /vendor/laravel/framework/src/Illuminate/View/Engines/PhpEngine.php:58
        • 32. /vendor/laravel/framework/src/Illuminate/View/Engines/CompilerEngine.php:72
      App\Models\HtmlFile
      1
      App\Models\Provider
      1
      App\Models\Topic
      1
      App\Models\Subject
      1
      App\Models\Course
      1
        _token
        RY1O26IegycbM5B12uWYPci3UhgL8ufmK8Y9VnqS
        locale
        ar
        _previous
        array:1 [ "url" => "https://www.corspedia.com/ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/data-p...
        _flash
        array:2 [ "old" => [] "new" => [] ]
        PHPDEBUGBAR_STACK_DATA
        []
        path_info
        /ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/data-processing-in-shell
        status_code
        200
        
        status_text
        OK
        format
        html
        content_type
        text/html; charset=UTF-8
        request_query
        []
        
        request_request
        []
        
        request_headers
        0 of 0
        array:24 [ "cf-ipcountry" => array:1 [ 0 => "US" ] "cf-connecting-ip" => array:1 [ 0 => "216.73.216.125" ] "cdn-loop" => array:1 [ 0 => "cloudflare; loops=1" ] "x-forwarded-proto" => array:1 [ 0 => "https" ] "x-forwarded-for" => array:1 [ 0 => "216.73.216.125" ] "sec-fetch-site" => array:1 [ 0 => "none" ] "accept" => array:1 [ 0 => "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7" ] "user-agent" => array:1 [ 0 => "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" ] "upgrade-insecure-requests" => array:1 [ 0 => "1" ] "sec-ch-ua-platform" => array:1 [ 0 => ""Windows"" ] "sec-ch-ua-mobile" => array:1 [ 0 => "?0" ] "sec-ch-ua" => array:1 [ 0 => ""Chromium";v="130", "HeadlessChrome";v="130", "Not?A_Brand";v="99"" ] "cache-control" => array:1 [ 0 => "no-cache" ] "pragma" => array:1 [ 0 => "no-cache" ] "sec-fetch-dest" => array:1 [ 0 => "document" ] "cf-ray" => array:1 [ 0 => "94d58366882622cc-ORD" ] "accept-encoding" => array:1 [ 0 => "gzip, br" ] "priority" => array:1 [ 0 => "u=0, i" ] "sec-fetch-user" => array:1 [ 0 => "?1" ] "sec-fetch-mode" => array:1 [ 0 => "navigate" ] "cf-visitor" => array:1 [ 0 => "{"scheme":"https"}" ] "host" => array:1 [ 0 => "www.corspedia.com" ] "content-length" => array:1 [ 0 => "" ] "content-type" => array:1 [ 0 => "" ] ]
        request_server
        0 of 0
        array:50 [ "USER" => "www-data" "HOME" => "/var/www" "HTTP_CF_IPCOUNTRY" => "US" "HTTP_CF_CONNECTING_IP" => "216.73.216.125" "HTTP_CDN_LOOP" => "cloudflare; loops=1" "HTTP_X_FORWARDED_PROTO" => "https" "HTTP_X_FORWARDED_FOR" => "216.73.216.125" "HTTP_SEC_FETCH_SITE" => "none" "HTTP_ACCEPT" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7" "HTTP_USER_AGENT" => "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" "HTTP_UPGRADE_INSECURE_REQUESTS" => "1" "HTTP_SEC_CH_UA_PLATFORM" => ""Windows"" "HTTP_SEC_CH_UA_MOBILE" => "?0" "HTTP_SEC_CH_UA" => ""Chromium";v="130", "HeadlessChrome";v="130", "Not?A_Brand";v="99"" "HTTP_CACHE_CONTROL" => "no-cache" "HTTP_PRAGMA" => "no-cache" "HTTP_SEC_FETCH_DEST" => "document" "HTTP_CF_RAY" => "94d58366882622cc-ORD" "HTTP_ACCEPT_ENCODING" => "gzip, br" "HTTP_PRIORITY" => "u=0, i" "HTTP_SEC_FETCH_USER" => "?1" "HTTP_SEC_FETCH_MODE" => "navigate" "HTTP_CF_VISITOR" => "{"scheme":"https"}" "HTTP_HOST" => "www.corspedia.com" "REDIRECT_STATUS" => "200" "SERVER_NAME" => "corspedia.com" "SERVER_PORT" => "443" "SERVER_ADDR" => "141.95.147.152" "REMOTE_USER" => "" "REMOTE_PORT" => "19576" "REMOTE_ADDR" => "172.71.254.180" "SERVER_SOFTWARE" => "nginx/1.18.0" "GATEWAY_INTERFACE" => "CGI/1.1" "HTTPS" => "on" "REQUEST_SCHEME" => "https" "SERVER_PROTOCOL" => "HTTP/2.0" "DOCUMENT_ROOT" => "/var/www/corspedia/public" "DOCUMENT_URI" => "/index.php" "REQUEST_URI" => "/ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/data-processing-in-shell" "SCRIPT_NAME" => "/index.php" "CONTENT_LENGTH" => "" "CONTENT_TYPE" => "" "REQUEST_METHOD" => "GET" "QUERY_STRING" => "" "SCRIPT_FILENAME" => "/var/www/corspedia/public/index.php" "PATH_INFO" => "" "FCGI_ROLE" => "RESPONDER" "PHP_SELF" => "/index.php" "REQUEST_TIME_FLOAT" => 1749523324.2018 "REQUEST_TIME" => 1749523324 ]
        request_cookies
        []
        
        response_headers
        0 of 0
        array:5 [ "content-type" => array:1 [ 0 => "text/html; charset=UTF-8" ] "cache-control" => array:1 [ 0 => "no-cache, private" ] "date" => array:1 [ 0 => "Tue, 10 Jun 2025 02:42:04 GMT" ] "set-cookie" => array:2 [ 0 => "XSRF-TOKEN=eyJpdiI6InFCRjFyZnJDcGhsSXBaYTN2VkRZd3c9PSIsInZhbHVlIjoiZm5GcVZLTUdiUGlNNHlaRDNIMlJiU1E0ODdFeDZHV1dIcXFjM0l3UjFOR28xa2dRSnFhcGVuZ3Q3WXQycCt2WnZTUXpjSGFLWnZFZjl1U3lGQ2xLaEVFMDg1SEFTcE1kMWYwSGhJTFlXSG5BeS92djJidFMvOXQydWJJUUlwdW8iLCJtYWMiOiI5OGQwMDI3ZWNhM2RmZjczZGY5YzIxN2U0MGY5YjY5NDQyODc5NjZhMmQ2YjkxNGM1YWQ1YzUxMDljZGJlNmY4IiwidGFnIjoiIn0%3D; expires=Tue, 10 Jun 2025 04:42:05 GMT; Max-Age=7200; path=/; samesite=laxXSRF-TOKEN=eyJpdiI6InFCRjFyZnJDcGhsSXBaYTN2VkRZd3c9PSIsInZhbHVlIjoiZm5GcVZLTUdiUGlNNHlaRDNIMlJiU1E0ODdFeDZHV1dIcXFjM0l3UjFOR28xa2dRSnFhcGVuZ3Q3WXQycCt2WnZTUXpjS" 1 => "laravel_session=eyJpdiI6IjQ5U0xnM0VoS2FLdFJJcDdjODJXcFE9PSIsInZhbHVlIjoiT0xnQWNtQUtBLzN5NFRLbWExVkl0RllsRi9IelVTNTRka1NFVDZKT0NPR2hvSzhnVHAwUDJ6VTg0UFk4Tjg2SnZtYlNOckI2S0FhTWFGZE5iVlphcll5UldrTG0wdm1FZUQ3VWs1Mjl4dG5lZmFkdTNnZWo2YTgvVHVUU3pIODAiLCJtYWMiOiI3NGMxZWM5NDg0MmFkMzdiOWY2MmUxMWM2MDQ0NTM4ZjQwY2Y3YTM0YWZiZjY3YTZmYTg1ZjI5NzA1NzZiNjJjIiwidGFnIjoiIn0%3D; expires=Tue, 10 Jun 2025 04:42:05 GMT; Max-Age=7200; path=/; httponly; samesite=laxlaravel_session=eyJpdiI6IjQ5U0xnM0VoS2FLdFJJcDdjODJXcFE9PSIsInZhbHVlIjoiT0xnQWNtQUtBLzN5NFRLbWExVkl0RllsRi9IelVTNTRka1NFVDZKT0NPR2hvSzhnVHAwUDJ6VTg0UFk4Tjg2SnZt" ] "Set-Cookie" => array:2 [ 0 => "XSRF-TOKEN=eyJpdiI6InFCRjFyZnJDcGhsSXBaYTN2VkRZd3c9PSIsInZhbHVlIjoiZm5GcVZLTUdiUGlNNHlaRDNIMlJiU1E0ODdFeDZHV1dIcXFjM0l3UjFOR28xa2dRSnFhcGVuZ3Q3WXQycCt2WnZTUXpjSGFLWnZFZjl1U3lGQ2xLaEVFMDg1SEFTcE1kMWYwSGhJTFlXSG5BeS92djJidFMvOXQydWJJUUlwdW8iLCJtYWMiOiI5OGQwMDI3ZWNhM2RmZjczZGY5YzIxN2U0MGY5YjY5NDQyODc5NjZhMmQ2YjkxNGM1YWQ1YzUxMDljZGJlNmY4IiwidGFnIjoiIn0%3D; expires=Tue, 10-Jun-2025 04:42:05 GMT; path=/XSRF-TOKEN=eyJpdiI6InFCRjFyZnJDcGhsSXBaYTN2VkRZd3c9PSIsInZhbHVlIjoiZm5GcVZLTUdiUGlNNHlaRDNIMlJiU1E0ODdFeDZHV1dIcXFjM0l3UjFOR28xa2dRSnFhcGVuZ3Q3WXQycCt2WnZTUXpjS" 1 => "laravel_session=eyJpdiI6IjQ5U0xnM0VoS2FLdFJJcDdjODJXcFE9PSIsInZhbHVlIjoiT0xnQWNtQUtBLzN5NFRLbWExVkl0RllsRi9IelVTNTRka1NFVDZKT0NPR2hvSzhnVHAwUDJ6VTg0UFk4Tjg2SnZtYlNOckI2S0FhTWFGZE5iVlphcll5UldrTG0wdm1FZUQ3VWs1Mjl4dG5lZmFkdTNnZWo2YTgvVHVUU3pIODAiLCJtYWMiOiI3NGMxZWM5NDg0MmFkMzdiOWY2MmUxMWM2MDQ0NTM4ZjQwY2Y3YTM0YWZiZjY3YTZmYTg1ZjI5NzA1NzZiNjJjIiwidGFnIjoiIn0%3D; expires=Tue, 10-Jun-2025 04:42:05 GMT; path=/; httponlylaravel_session=eyJpdiI6IjQ5U0xnM0VoS2FLdFJJcDdjODJXcFE9PSIsInZhbHVlIjoiT0xnQWNtQUtBLzN5NFRLbWExVkl0RllsRi9IelVTNTRka1NFVDZKT0NPR2hvSzhnVHAwUDJ6VTg0UFk4Tjg2SnZt" ] ]
        session_attributes
        0 of 0
        array:5 [ "_token" => "RY1O26IegycbM5B12uWYPci3UhgL8ufmK8Y9VnqS" "locale" => "ar" "_previous" => array:1 [ "url" => "https://www.corspedia.com/ar/%D8%A7%D9%84%D8%AF%D9%88%D8%B1%D8%A7%D8%AA/data-processing-in-shell" ] "_flash" => array:2 [ "old" => [] "new" => [] ] "PHPDEBUGBAR_STACK_DATA" => [] ]